公众号有个小伙伴问我,Python或R是否可以对spss训练好的pmml模型进行解释分析,做shap值或依赖图。
于是利用空余时间研究了一下。
SHAP(SHapley Additive exPlanations)是一个用于解释机器学习模型预测的Python库。
它基于博弈论中的沙普利(Shapley)值,用于衡量每个特征对预测结果的影响。
在风控建模中,SHAP库可以帮助理解哪些特征对贷款违约等风险预测的影响最大。
例如,通过SHAP值可以对比收入、信用评分、负债比率等特征对贷款违约预测的影响程度。
一、SHAP库的使用步骤
代码语言:javascript复制SHAP库在风控建模中的使用步骤如下:
- 数据准备:首先,需要准备用于建模的数据集。这可能包括各种特征,如借款人的收入、信用评分、负债比率等。
- 模型训练:使用适当的机器学习算法(如逻辑回归、随机森林或神经网络)对数据进行训练,以预测贷款违约等风险。
- SHAP值计算:使用SHAP库计算每个特征对预测结果的贡献。这可以通过shap.Explainer类实现,该类接受一个已经训练好的模型并计算每个特征的SHAP值。
- 结果解释:通过比较不同特征的SHAP值,可以了解哪些特征对预测结果的影响最大。例如,如果收入特征的SHAP值为正且较大,则说明收入越高,贷款违约的风险越低。
需要注意的是,虽然SHAP库可以提供有关每个特征对预测结果的影响的信息,但它并不能直接优化模型的性能。
因此,在使用SHAP库进行解释时,应当结合其他优化方法(如特征选择、超参数调整等)来提高模型的性能。
代码语言:javascript复制
三、SHAP值可视化、和模型特征重要性比较
1 导入数据 首先读取Python中自带的鸢尾花数据,具体代码如下: # 导入并处理鸢尾花数据集 import pandas as pd from sklearn.datasets import load_iris iris = load_iris() # 导入鸢尾花数据集 df = pd.DataFrame(data=iris.data, columns=[i.replace(' ', '_')for i in iris.feature_names]) # 特征转DataFrame df['target'] = iris.target # 添加目标值 df = df[df.target.isin([0, 1 ])] # 取目标值中的0,1类型的数据,用来做二分类算法 df 得到结果:
代码语言:javascript复制
2 训练模型
接着用随机森林模型对数据进行训练,具体代码如下:
import shap
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
# 分割数据集,用来训练模型
X = df.drop('target', axis=1)
y = df['target']
# 训练随机森林模型
model = RandomForestClassifier()
model.fit(X, y)
3 生成shap值然后应用shap库生成shap值,代码如下:
# 使用SHAP解释模型预测
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
# 输出每个特征的SHAP值
print(shap_values)
得到结果:
[array([[ 0.04326835, 0.01389136, 0.2289992 , 0.21054109],
[ 0.05049107, -0.00558208, 0.24124992, 0.21054109],
[ 0.05160724, 0.00555247, 0.2289992 , 0.21054109],
[ 0.05137996, 0.00131546, 0.23346349, 0.21054109],
[ 0.04326835, 0.01389136, 0.2289992 , 0.21054109],
[ 0.02668498, 0.0271903 , 0.23262454, 0.21020018],
[ 0.04915724, 0.00800247, 0.2289992 , 0.21054109],
[ 0.04826835, 0.00889136, 0.2289992 , 0.21054109],
[ 0.05137996, -0.00647097, 0.24124992, 0.21054109],
[ 0.05049107, 0.00220435, 0.23346349, 0.21054109],
[ 0.02918498, 0.0246903 , 0.23262454, 0.21020018],
[ 0.04826835, 0.00889136, 0.2289992 , 0.21054109],
[ 0.05049107, -0.00558208, 0.24124992, 0.21054109],
[ 0.05137996, -0.00647097, 0.24124992, 0.21054109],
[-0.03431018, 0.05324627, 0.2486132 , 0.21915072],
[-0.03431018, 0.05324627, 0.2486132 , 0.21915072],
[ 0.02668498, 0.0271903 , 0.23262454, 0.21020018],
[ 0.04326835, 0.01389136, 0.2289992 , 0.21054109],
[-0.03431018, 0.05324627, 0.2486132 , 0.21915072],
[ 0.03411835, 0.02304136, 0.2289992 , 0.21054109],
[ 0.03583498, 0.0080403 , 0.23262454, 0.21020018],
[ 0.03661835, 0.02054136, 0.2289992 , 0.21054109],
[ 0.04415724, 0.01300247, 0.2289992 , 0.21054109],
[ 0.05071835, 0.00644136, 0.2289992 , 0.21054109],
[ 0.04826835, 0.00889136, 0.2289992 , 0.21054109],
[ 0.05049107, -0.00558208, 0.24124992, 0.21054109],
[ 0.04826835, 0.00889136, 0.2289992 , 0.21054109],
[ 0.04326835, 0.01389136, 0.2289992 , 0.21054109],
[ 0.04826835, 0.00889136, 0.2289992 , 0.21054109],
[ 0.05160724, 0.00555247, 0.2289992 , 0.21054109],
[ 0.05049107, 0.00220435, 0.23346349, 0.21054109],
[ 0.03583498, 0.0080403 , 0.23262454, 0.21020018],
[ 0.03411835, 0.02304136, 0.2289992 , 0.21054109],
[-0.03431018, 0.05324627, 0.2486132 , 0.21915072],
[ 0.05049107, 0.00220435, 0.23346349, 0.21054109],
[ 0.05071835, 0.00644136, 0.2289992 , 0.21054109],
[-0.04516018, 0.02409627, 0.2486132 , 0.21915072],
[ 0.04326835, 0.01389136, 0.2289992 , 0.21054109],
[ 0.05137996, -0.00647097, 0.24124992, 0.21054109],
[ 0.04826835, 0.00889136, 0.2289992 , 0.21054109],
[ 0.04326835, 0.01389136, 0.2289992 , 0.21054109],
[ 0.05064224, -0.01562289, 0.24124992, 0.21043073],
[ 0.05160724, 0.00555247, 0.2289992 , 0.21054109],
[ 0.04326835, 0.01389136, 0.2289992 , 0.21054109],
[ 0.03411835, 0.02304136, 0.2289992 , 0.21054109],
[ 0.05049107, -0.00558208, 0.24124992, 0.21054109],
[ 0.03411835, 0.02304136, 0.2289992 , 0.21054109],
[ 0.05160724, 0.00555247, 0.2289992 , 0.21054109],
[ 0.03661835, 0.02054136, 0.2289992 , 0.21054109],
[ 0.05071835, 0.00644136, 0.2289992 , 0.21054109],
[-0.04189352, 0.00151293, -0.2317868 , -0.23113261],
[-0.04189352, 0.00151293, -0.2317868 , -0.23113261],
[-0.03712079, -0.00272408, -0.23232252, -0.23113261],
[-0.03220356, -0.01615747, -0.22453609, -0.23040288],
[-0.03712079, -0.01051051, -0.22453609, -0.23113261],
[-0.03492079, -0.01271051, -0.22453609, -0.23113261],
[-0.04189352, 0.00151293, -0.2317868 , -0.23113261],
[ 0.03065335, -0.021934 , -0.23915008, -0.27286927],
[-0.03712079, -0.01051051, -0.22453609, -0.23113261],
[ 0.03234965, -0.01102659, -0.23915008, -0.27547297],
[ 0.03065335, -0.021934 , -0.23915008, -0.27286927],
[-0.03492079, -0.01271051, -0.22453609, -0.23113261],
[-0.03282023, -0.01677414, -0.22453609, -0.22916955],
[-0.03712079, -0.01051051, -0.22453609, -0.23113261],
[-0.03492079, -0.01271051, -0.22453609, -0.23113261],
[-0.03712079, -0.00272408, -0.23232252, -0.23113261],
[-0.03492079, -0.01271051, -0.22453609, -0.23113261],
[-0.03537023, -0.01422414, -0.22453609, -0.22916955],
[-0.03440356, -0.01395747, -0.22453609, -0.23040288],
[-0.03220356, -0.01615747, -0.22453609, -0.23040288],
[-0.03469352, 0.00431293, -0.2317868 , -0.23113261],
[-0.03712079, -0.01051051, -0.22453609, -0.23113261],
[-0.03440356, -0.01395747, -0.22453609, -0.23040288],
[-0.03712079, -0.01051051, -0.22453609, -0.23113261],
[-0.03712079, -0.01051051, -0.22453609, -0.23113261],
[-0.03712079, -0.01051051, -0.22453609, -0.23113261],
[-0.03712079, -0.01051051, -0.22453609, -0.23113261],
[-0.03712079, -0.01051051, -0.22453609, -0.23113261],
[-0.03492079, -0.01271051, -0.22453609, -0.23113261],
[-0.03282023, -0.01677414, -0.22453609, -0.22916955],
[-0.03220356, -0.01615747, -0.22453609, -0.23040288],
[-0.03282023, -0.01677414, -0.22453609, -0.22916955],
[-0.03475356, -0.01360747, -0.22453609, -0.23040288],
[-0.03475356, -0.01360747, -0.22453609, -0.23040288],
[ 0.02637067, -0.00935907, -0.23552475, -0.27478686],
[-0.03214352, 0.01176293, -0.2317868 , -0.23113261],
[-0.03712079, -0.00272408, -0.23232252, -0.23113261],
[-0.03440356, -0.01395747, -0.22453609, -0.23040288],
[-0.03492079, -0.01271051, -0.22453609, -0.23113261],
[-0.03220356, -0.01615747, -0.22453609, -0.23040288],
[-0.03220356, -0.01615747, -0.22453609, -0.23040288],
[-0.03712079, -0.01051051, -0.22453609, -0.23113261],
[-0.03220356, -0.01615747, -0.22453609, -0.23040288],
[ 0.03065335, -0.021934 , -0.23915008, -0.27286927],
[-0.03475356, -0.01360747, -0.22453609, -0.23040288],
[-0.03492079, -0.01271051, -0.22453609, -0.23113261],
[-0.03492079, -0.01271051, -0.22453609, -0.23113261],
[-0.03712079, -0.01051051, -0.22453609, -0.23113261],
[ 0.02989965, -0.02302659, -0.23470008, -0.27547297],
[-0.03492079, -0.01271051, -0.22453609, -0.23113261]]), array([[-0.04326835, -0.01389136, -0.2289992 , -0.21054109],
[-0.05049107, 0.00558208, -0.24124992, -0.21054109],
[-0.05160724, -0.00555247, -0.2289992 , -0.21054109],
[-0.05137996, -0.00131546, -0.23346349, -0.21054109],
[-0.04326835, -0.01389136, -0.2289992 , -0.21054109],
[-0.02668498, -0.0271903 , -0.23262454, -0.21020018],
[-0.04915724, -0.00800247, -0.2289992 , -0.21054109],
[-0.04826835, -0.00889136, -0.2289992 , -0.21054109],
[-0.05137996, 0.00647097, -0.24124992, -0.21054109],
[-0.05049107, -0.00220435, -0.23346349, -0.21054109],
[-0.02918498, -0.0246903 , -0.23262454, -0.21020018],
[-0.04826835, -0.00889136, -0.2289992 , -0.21054109],
[-0.05049107, 0.00558208, -0.24124992, -0.21054109],
[-0.05137996, 0.00647097, -0.24124992, -0.21054109],
[ 0.03431018, -0.05324627, -0.2486132 , -0.21915072],
[ 0.03431018, -0.05324627, -0.2486132 , -0.21915072],
[-0.02668498, -0.0271903 , -0.23262454, -0.21020018],
[-0.04326835, -0.01389136, -0.2289992 , -0.21054109],
[ 0.03431018, -0.05324627, -0.2486132 , -0.21915072],
[-0.03411835, -0.02304136, -0.2289992 , -0.21054109],
[-0.03583498, -0.0080403 , -0.23262454, -0.21020018],
[-0.03661835, -0.02054136, -0.2289992 , -0.21054109],
[-0.04415724, -0.01300247, -0.2289992 , -0.21054109],
[-0.05071835, -0.00644136, -0.2289992 , -0.21054109],
[-0.04826835, -0.00889136, -0.2289992 , -0.21054109],
[-0.05049107, 0.00558208, -0.24124992, -0.21054109],
[-0.04826835, -0.00889136, -0.2289992 , -0.21054109],
[-0.04326835, -0.01389136, -0.2289992 , -0.21054109],
[-0.04826835, -0.00889136, -0.2289992 , -0.21054109],
[-0.05160724, -0.00555247, -0.2289992 , -0.21054109],
[-0.05049107, -0.00220435, -0.23346349, -0.21054109],
[-0.03583498, -0.0080403 , -0.23262454, -0.21020018],
[-0.03411835, -0.02304136, -0.2289992 , -0.21054109],
[ 0.03431018, -0.05324627, -0.2486132 , -0.21915072],
[-0.05049107, -0.00220435, -0.23346349, -0.21054109],
[-0.05071835, -0.00644136, -0.2289992 , -0.21054109],
[ 0.04516018, -0.02409627, -0.2486132 , -0.21915072],
[-0.04326835, -0.01389136, -0.2289992 , -0.21054109],
[-0.05137996, 0.00647097, -0.24124992, -0.21054109],
[-0.04826835, -0.00889136, -0.2289992 , -0.21054109],
[-0.04326835, -0.01389136, -0.2289992 , -0.21054109],
[-0.05064224, 0.01562289, -0.24124992, -0.21043073],
[-0.05160724, -0.00555247, -0.2289992 , -0.21054109],
[-0.04326835, -0.01389136, -0.2289992 , -0.21054109],
[-0.03411835, -0.02304136, -0.2289992 , -0.21054109],
[-0.05049107, 0.00558208, -0.24124992, -0.21054109],
[-0.03411835, -0.02304136, -0.2289992 , -0.21054109],
[-0.05160724, -0.00555247, -0.2289992 , -0.21054109],
[-0.03661835, -0.02054136, -0.2289992 , -0.21054109],
[-0.05071835, -0.00644136, -0.2289992 , -0.21054109],
[ 0.04189352, -0.00151293, 0.2317868 , 0.23113261],
[ 0.04189352, -0.00151293, 0.2317868 , 0.23113261],
[ 0.03712079, 0.00272408, 0.23232252, 0.23113261],
[ 0.03220356, 0.01615747, 0.22453609, 0.23040288],
[ 0.03712079, 0.01051051, 0.22453609, 0.23113261],
[ 0.03492079, 0.01271051, 0.22453609, 0.23113261],
[ 0.04189352, -0.00151293, 0.2317868 , 0.23113261],
[-0.03065335, 0.021934 , 0.23915008, 0.27286927],
[ 0.03712079, 0.01051051, 0.22453609, 0.23113261],
[-0.03234965, 0.01102659, 0.23915008, 0.27547297],
[-0.03065335, 0.021934 , 0.23915008, 0.27286927],
[ 0.03492079, 0.01271051, 0.22453609, 0.23113261],
[ 0.03282023, 0.01677414, 0.22453609, 0.22916955],
[ 0.03712079, 0.01051051, 0.22453609, 0.23113261],
[ 0.03492079, 0.01271051, 0.22453609, 0.23113261],
[ 0.03712079, 0.00272408, 0.23232252, 0.23113261],
[ 0.03492079, 0.01271051, 0.22453609, 0.23113261],
[ 0.03537023, 0.01422414, 0.22453609, 0.22916955],
[ 0.03440356, 0.01395747, 0.22453609, 0.23040288],
[ 0.03220356, 0.01615747, 0.22453609, 0.23040288],
[ 0.03469352, -0.00431293, 0.2317868 , 0.23113261],
[ 0.03712079, 0.01051051, 0.22453609, 0.23113261],
[ 0.03440356, 0.01395747, 0.22453609, 0.23040288],
[ 0.03712079, 0.01051051, 0.22453609, 0.23113261],
[ 0.03712079, 0.01051051, 0.22453609, 0.23113261],
[ 0.03712079, 0.01051051, 0.22453609, 0.23113261],
[ 0.03712079, 0.01051051, 0.22453609, 0.23113261],
[ 0.03712079, 0.01051051, 0.22453609, 0.23113261],
[ 0.03492079, 0.01271051, 0.22453609, 0.23113261],
[ 0.03282023, 0.01677414, 0.22453609, 0.22916955],
[ 0.03220356, 0.01615747, 0.22453609, 0.23040288],
[ 0.03282023, 0.01677414, 0.22453609, 0.22916955],
[ 0.03475356, 0.01360747, 0.22453609, 0.23040288],
[ 0.03475356, 0.01360747, 0.22453609, 0.23040288],
[-0.02637067, 0.00935907, 0.23552475, 0.27478686],
[ 0.03214352, -0.01176293, 0.2317868 , 0.23113261],
[ 0.03712079, 0.00272408, 0.23232252, 0.23113261],
[ 0.03440356, 0.01395747, 0.22453609, 0.23040288],
[ 0.03492079, 0.01271051, 0.22453609, 0.23113261],
[ 0.03220356, 0.01615747, 0.22453609, 0.23040288],
[ 0.03220356, 0.01615747, 0.22453609, 0.23040288],
[ 0.03712079, 0.01051051, 0.22453609, 0.23113261],
[ 0.03220356, 0.01615747, 0.22453609, 0.23040288],
[-0.03065335, 0.021934 , 0.23915008, 0.27286927],
[ 0.03475356, 0.01360747, 0.22453609, 0.23040288],
[ 0.03492079, 0.01271051, 0.22453609, 0.23113261],
[ 0.03492079, 0.01271051, 0.22453609, 0.23113261],
[ 0.03712079, 0.01051051, 0.22453609, 0.23113261],
[-0.02989965, 0.02302659, 0.23470008, 0.27547297],
[ 0.03492079, 0.01271051, 0.22453609, 0.23113261]])]
注意:如果变量的特征较多、或数据较大时,生成shap值非常慢。
4 可视化shap值接着可视化shap值,代码如下:
# 可视化SHAP值
shap.summary_plot(shap_values[0], X, plot_type="bar")
得到结果:
代码语言:javascript复制
可以发现petal_length变量的shap值是最大的,即该变量对因变量y的影响最大。其次是petal_width变量。
5 对比模型特征重要性由于变量较多时,计算shap值比较慢。而模型特征重要性和shap值的效果是差不多的,都是评价特征对因变量y的重要程度。如果对比shap值和模型特征重要性的结果是类似的,可以用特征重要性来代替计算shap值。可视化特征重要性,代码如下:
features_import = pd.DataFrame({'feature':X.columns, 'importance':model.feature_importances_}) #构建特征重要性数据框
# 绘图
from matplotlib import pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei'] # 显示中文黑体
plt.barh(features_import['feature'], features_import['importance'], height=0.7, color='blue', edgecolor='#005344') # 更多颜色可参见颜色大全
plt.xlabel('feature importance') # x 轴
plt.ylabel('features') # y轴
plt.title('Feature Importances') # 标题
for a,b in zip( features_import['importance'],features_import['feature']): # 添加数字标签
print(a,b)
plt.text(a 0.001, b,'%.3f'%float(a)) # a 0.001代表标签位置在柱形图上方0.001处
plt.show()
得到结果:
可以发现变量重要性排序和shap值是一致的,petal_length变量的特征重要性值是最大的,其次是petal_width变量。
至此,风控建模中的shap值可视化已讲解完毕,如想了解更多建模内容,可以翻看公众号中“风控建模”模块相关文章。