模型优化

武昌鱼
计算机科学
2023-04-23
113热度
0评论

1、计时器

#start time
time_start = time.time() 
#end time
time_end = time.time() 
#spend time
time_c= time_end - time_start 
print(time_c)

2、不要用导入的安装包名字命名对象

不要用xgb来命名XGBRegressor对象，

3、GridSearchCV调参的参数包括：

estimator：分类器，即用于建模的算法。
param_grid：用于最优化的参数的取值，值为字典或者列表，例如param_grid = {'C': [0.01, 0.1, 1, 10]}。
cv：交叉验证折数，默认为5折。
scoring：评估函数，即计算模型得分的函数，默认为accuracy_score。
fit_params：训练时需要传入的参数，例如fit_params = {'sample_weight': [0.1, 0.2, 0.3, 0.4]}。
n_jobs：并行化训练的工作线程数，默认为1。
verbose：输出详细信息，默认为0。

其中，estimator参数需要根据具体的问题选择合适的分类器，param_grid参数需要根据具体的问题调整超参数，cv参数需要根据具体的问题选择合适的折数，scoring参数需要根据具体的问题选择合适的评估函数，fit_params参数需要根据具体的问题设置训练时需要传入的参数。

4、SVR模型优化

超参数：

param_grid = param = {'kernel' : ('linear', 'poly', 'rbf', 'sigmoid'), 'C' : [1,5,10],'degree' : [3,8],'coef0' : [0.01,10,0.5], 'gamma' : ('auto','scale')}

SVR模型的超参数包括

核函数：使用线性核、多项式核、高斯核

正则化参数C：根据实际问题选择合适的值

松弛因子：松弛因子可以用于控制模型的复杂度。不同的超参数组合可以得到不同的模型性能

示例：

from sklearn import datasets  
from sklearn.svm import SVC  
from sklearn.model_selection import train_test_split  
from sklearn.metrics import accuracy_score  

# 加载鸢尾花数据集  
iris = datasets.load_iris()  
X = iris.data  
y = iris.target  

# 将数据集分为训练集和测试集  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  

# 定义SVR模型  
svm_model = SVC(kernel='linear', C=1)  

# 定义要调整的参数范围  
param_grid = {'kernel': ['linear', 'poly'], 'C': [0.01, 0.1, 1, 10]}  

# 使用GridSearchCV进行调参  
grid_search = GridSearchCV(estimator=svm_model, param_grid=param_grid, cv=5)  
grid_search.fit(X_train, y_train)  

# 输出最佳参数和最佳得分  
print("Best parameters: ", grid_search.best_params_)  
print("Best score: ", grid_search.best_score_)

5、将DataFrame写入Excel文件

X_train.to_excel('example.xlsx', sheet_name='Sheet1', index=False)

6、

GBR模型的超参数包括：

kernel_type：GBR模型的核函数类型，可以是线性核、多项式核、高斯核等。
gamma：GBR模型的偏置参数，用于调整正则化系数的强度。
nu：GBR模型的阻尼系数。
c1：GBR模型的惩罚参数，用于控制模型的复杂度。
c2：GBR模型的惩罚参数，用于控制模型的复杂度。
kernel_regularization：GBR模型的正则化参数，用于控制模型的复杂度。
learning_rate：GBR模型的学习率。
n_estimators：GBR模型的分裂参数，用于控制模型的复杂度。
max_depth：GBR模型的树深度。
subsample：GBR模型的样本采样参数，用于控制模型的泛化能力。
feature_fraction：GBR模型的特征比例，用于控制模型的泛化能力。
bagging_fraction：GBR模型的随机采样参数，用于控制模型的泛化能力。
n_clusters：GBR模型的聚类数。

7、相关参数

slump：

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=109)
model = GradientBoostingRegressor(n_estimators=109, max_depth=3, learning_rate=0.1, random_state=188)

permeability

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=718)

model = GradientBoostingRegressor(n_estimators=78, max_depth=6, learning_rate=0.1, random_state=148)