在我们建立模型的时候需要(尽量)确定最优的参数,比如以下KNN的例子,如果直接实现,我们可以用for-loop来寻找最大的score,从而确定对应的参数:
代码语言:javascript复制%%time
# 寻找明科夫斯基距离最优的p
best_p = -1
best_score = 0
best_k = -1
for p in range(1, 6):
for k in range(1, 11):
knn_clf = KNeighborsClassifier(n_neighbors=k, weights = 'distance', p = p)
knn_clf.fit(X_train, y_train)
y_predict = knn_clf.predict(X_test)
score = accuracy_score(y_test, y_predict)
if score > best_score:
best_score = score
best_k = k
best_p = p
print("best k is {}; nbest score is {};nbest p is {}.".format(best_k, best_score, best_p))
但是这样搜索比较麻烦,而且weights还有别的参数,自己写比较麻烦。而scikit learn已经给我们封装好了GridSearchCV方法,我们直接调用即可:
代码语言:javascript复制from sklearn.model_selection import GridSearchCV
# 定义搜索参数
param_grid = [
{
'weights':['uniform'],
'n_neighbors':[i for i in range(1, 11)]
},
{
'weights': ['distance'],
'n_neighbors': [i for i in range(1, 11)],
'p': [i for i in range(1, 6)]
}
]
knn_clf = KNeighborsClassifier()
# n_jobs是提供的核的数量,-1是用上电脑所有的核;verbose输出中间过程
grid_search = GridSearchCV(knn_clf, param_grid, n_jobs = -1, verbose = 2)
grid_search.fit(X_train, y_train)
# 查看参数:
grid_search.best_params_
# 或者
grid_search.best_estimator_
# 最优score
grid_search.best_score_