GridSearchCV (scikit learn) 确定超参数

2020-10-10 10:40:13 浏览数 (2)

在我们建立模型的时候需要(尽量)确定最优的参数,比如以下KNN的例子,如果直接实现,我们可以用for-loop来寻找最大的score,从而确定对应的参数:

代码语言:javascript复制
%%time
# 寻找明科夫斯基距离最优的p

best_p = -1
best_score = 0
best_k = -1

for p in range(1, 6):
    for k in range(1, 11):
        knn_clf = KNeighborsClassifier(n_neighbors=k, weights = 'distance', p = p)
        knn_clf.fit(X_train, y_train)
        y_predict = knn_clf.predict(X_test)
        score = accuracy_score(y_test, y_predict)
        if score > best_score:
            best_score = score
            best_k = k
            best_p = p
            
print("best k is {}; nbest score is {};nbest p is {}.".format(best_k, best_score, best_p))

但是这样搜索比较麻烦,而且weights还有别的参数,自己写比较麻烦。而scikit learn已经给我们封装好了GridSearchCV方法,我们直接调用即可:

代码语言:javascript复制
from sklearn.model_selection import GridSearchCV

# 定义搜索参数

param_grid = [
    {
        'weights':['uniform'],
        'n_neighbors':[i for i in range(1, 11)]
    },
    {
        'weights': ['distance'],
        'n_neighbors': [i for i in range(1, 11)],
        'p': [i for i in range(1, 6)]
    }
]

knn_clf = KNeighborsClassifier()

# n_jobs是提供的核的数量,-1是用上电脑所有的核;verbose输出中间过程
grid_search = GridSearchCV(knn_clf, param_grid, n_jobs = -1, verbose = 2)
grid_search.fit(X_train, y_train)

# 查看参数:
grid_search.best_params_
# 或者
grid_search.best_estimator_

# 最优score
grid_search.best_score_

0 人点赞