import numpy as np
import matplotlib.pyplot as plt
from sklearn.feature_selection import f_regression, mutual_info_regression
np.random.seed(0)
X=np.random.rand(100,3)
y=X[:,0] np.sin(6*np.pi*X[:,1]) 0.1*np.random.randn(100)
f_test,_=f_regression(X,y)
f_test/=np.max(f_test)
mi=mutual_info_regression(X,y)
mi/=np.max(mi)
plt.figure(figsize=(15,5))
for i in range(3):
plt.subplot(1,3,i 1)
plt.scatter(X[:,i],y,edgecolor='black',s=20)
plt.xlabel("$x_{}$".format(i 1),fontsize=14)
if i==0:
plt.ylabel("$y$",fontsize=14)
plt.title("F-test={:.2f},MI={:.2f}".format(f_test[i],mi[i]),fontsize=16)
plt.show()
算法:F检验和互信息是前者仅仅反映线性依赖关系,后者反映变量之间的任何类型(包括线性和非线性关系)的相关性,和F检验相似,既可以做回归,也可以做分类,并且包含两个类feature_selection.mutual_info_classif(互信息分类)和feature_selection.mutual_info_regression(互信息回归)。
文献:《Design and Analysis of Experiments》
《A review of feature selection techniques in bioinformatics》
链接:http://appliedpredictivemodeling.com/
https://github.com/scikit-learn/scikit-learn
https://scikit-learn.org/stable/auto_examples/feature_selection/plot_f_test_vs_mi.html#sphx-glr-auto-examples-feature-selection-plot-f-test-vs-mi-py
https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html#sklearn.feature_selection.SelectKBest
http://lijiancheng0614.github.io/scikit-learn/modules/generated/sklearn.feature_selection.SelectFpr.html