数据集[1] 提取码:krry
有关AdaBoost的详细介绍可以参考:【干货】集成学习(Ensemble Learning)原理总结
•先利用pandas读入csv文件,以DataFrame形式存储;然后将数据转成list(其实也可以直接操作,不过本人习惯这样做):
代码语言:javascript复制data = np.array(data).tolist()
•分割数据,最后一列作为标签类别y,其余列为x:
代码语言:javascript复制x = []; y = []
for i in range(len(data)):
y.append(data[i][-1])
del data[i][-1]
x.append(data[i])
•训练模型
代码语言:javascript复制clf = AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=7, min_samples_leaf=7), n_estimators=100, algorithm='SAMME', learning_rate=0.95)
clf.fit(train_x, train_y)
•测试
代码语言:javascript复制print(clf.score(test_x, test_y))
•完整代码:
代码语言:javascript复制from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
import pandas as pd
import numpy as np
def load_data(path):
data = pd.read_csv(path, sep='t', names=[i for i in range(22)])
data = np.array(data).tolist()
x = []; y = []
for i in range(len(data)):
y.append(data[i][-1])
del data[i][-1]
x.append(data[i])
x = np.array(x)
y = np.array(y)
return x, y
def AdaBoost():
train_x, train_y = load_data('ensemble/horseColicTraining.txt')
test_x, test_y = load_data('ensemble/horseColicTest.txt')
#训练
clf = AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=7, min_samples_leaf=7), n_estimators=100, algorithm='SAMME', learning_rate=0.95)
clf.fit(train_x, train_y)
#测试
print(clf.score(test_x, test_y))
if __name__ == '__main__':
AdaBoost()
References
[1]
数据集: https://pan.baidu.com/s/14PM4zLUBr6BamLA-nEFujQ