风控建模中SHAP值原理与Python实现

2023-10-25 16:23:23 浏览数 (1)

公众号有个小伙伴问我,Python或R是否可以对spss训练好的pmml模型进行解释分析,做shap值或依赖图。

于是利用空余时间研究了一下。

SHAP(SHapley Additive exPlanations)是一个用于解释机器学习模型预测的Python库。

它基于博弈论中的沙普利(Shapley)值,用于衡量每个特征对预测结果的影响。

在风控建模中,SHAP库可以帮助理解哪些特征对贷款违约等风险预测的影响最大。

例如,通过SHAP值可以对比收入、信用评分、负债比率等特征对贷款违约预测的影响程度。

一、SHAP库的使用步骤

代码语言:javascript复制
SHAP库在风控建模中的使用步骤如下:
  1. 数据准备:首先,需要准备用于建模的数据集。这可能包括各种特征,如借款人的收入、信用评分、负债比率等。
  2. 模型训练:使用适当的机器学习算法(如逻辑回归、随机森林或神经网络)对数据进行训练,以预测贷款违约等风险。
  3. SHAP值计算:使用SHAP库计算每个特征对预测结果的贡献。这可以通过shap.Explainer类实现,该类接受一个已经训练好的模型并计算每个特征的SHAP值。
  4. 结果解释:通过比较不同特征的SHAP值,可以了解哪些特征对预测结果的影响最大。例如,如果收入特征的SHAP值为正且较大,则说明收入越高,贷款违约的风险越低。

需要注意的是,虽然SHAP库可以提供有关每个特征对预测结果的影响的信息,但它并不能直接优化模型的性能。

因此,在使用SHAP库进行解释时,应当结合其他优化方法(如特征选择、超参数调整等)来提高模型的性能。

代码语言:javascript复制


三、SHAP值可视化、和模型特征重要性比较

1 导入数据 首先读取Python中自带的鸢尾花数据,具体代码如下: # 导入并处理鸢尾花数据集 import pandas as pd from sklearn.datasets import load_iris iris = load_iris() # 导入鸢尾花数据集 df = pd.DataFrame(data=iris.data, columns=[i.replace(' ', '_')for i in iris.feature_names]) # 特征转DataFrame df['target'] = iris.target # 添加目标值 df = df[df.target.isin([0, 1 ])] # 取目标值中的0,1类型的数据,用来做二分类算法 df 得到结果:

代码语言:javascript复制



2   训练模型
接着用随机森林模型对数据进行训练,具体代码如下:
import shap  
import numpy as np  
import pandas as pd  
from sklearn.ensemble import RandomForestClassifier  

# 分割数据集,用来训练模型
X = df.drop('target', axis=1)
y = df['target']
# 训练随机森林模型  
model = RandomForestClassifier()  
model.fit(X, y)



3   生成shap值然后应用shap库生成shap值,代码如下:
# 使用SHAP解释模型预测  
explainer = shap.TreeExplainer(model)  
shap_values = explainer.shap_values(X)
  
# 输出每个特征的SHAP值  
print(shap_values)
得到结果:
[array([[ 0.04326835,  0.01389136,  0.2289992 ,  0.21054109],
       [ 0.05049107, -0.00558208,  0.24124992,  0.21054109],
       [ 0.05160724,  0.00555247,  0.2289992 ,  0.21054109],
       [ 0.05137996,  0.00131546,  0.23346349,  0.21054109],
       [ 0.04326835,  0.01389136,  0.2289992 ,  0.21054109],
       [ 0.02668498,  0.0271903 ,  0.23262454,  0.21020018],
       [ 0.04915724,  0.00800247,  0.2289992 ,  0.21054109],
       [ 0.04826835,  0.00889136,  0.2289992 ,  0.21054109],
       [ 0.05137996, -0.00647097,  0.24124992,  0.21054109],
       [ 0.05049107,  0.00220435,  0.23346349,  0.21054109],
       [ 0.02918498,  0.0246903 ,  0.23262454,  0.21020018],
       [ 0.04826835,  0.00889136,  0.2289992 ,  0.21054109],
       [ 0.05049107, -0.00558208,  0.24124992,  0.21054109],
       [ 0.05137996, -0.00647097,  0.24124992,  0.21054109],
       [-0.03431018,  0.05324627,  0.2486132 ,  0.21915072],
       [-0.03431018,  0.05324627,  0.2486132 ,  0.21915072],
       [ 0.02668498,  0.0271903 ,  0.23262454,  0.21020018],
       [ 0.04326835,  0.01389136,  0.2289992 ,  0.21054109],
       [-0.03431018,  0.05324627,  0.2486132 ,  0.21915072],
       [ 0.03411835,  0.02304136,  0.2289992 ,  0.21054109],
       [ 0.03583498,  0.0080403 ,  0.23262454,  0.21020018],
       [ 0.03661835,  0.02054136,  0.2289992 ,  0.21054109],
       [ 0.04415724,  0.01300247,  0.2289992 ,  0.21054109],
       [ 0.05071835,  0.00644136,  0.2289992 ,  0.21054109],
       [ 0.04826835,  0.00889136,  0.2289992 ,  0.21054109],
       [ 0.05049107, -0.00558208,  0.24124992,  0.21054109],
       [ 0.04826835,  0.00889136,  0.2289992 ,  0.21054109],
       [ 0.04326835,  0.01389136,  0.2289992 ,  0.21054109],
       [ 0.04826835,  0.00889136,  0.2289992 ,  0.21054109],
       [ 0.05160724,  0.00555247,  0.2289992 ,  0.21054109],
       [ 0.05049107,  0.00220435,  0.23346349,  0.21054109],
       [ 0.03583498,  0.0080403 ,  0.23262454,  0.21020018],
       [ 0.03411835,  0.02304136,  0.2289992 ,  0.21054109],
       [-0.03431018,  0.05324627,  0.2486132 ,  0.21915072],
       [ 0.05049107,  0.00220435,  0.23346349,  0.21054109],
       [ 0.05071835,  0.00644136,  0.2289992 ,  0.21054109],
       [-0.04516018,  0.02409627,  0.2486132 ,  0.21915072],
       [ 0.04326835,  0.01389136,  0.2289992 ,  0.21054109],
       [ 0.05137996, -0.00647097,  0.24124992,  0.21054109],
       [ 0.04826835,  0.00889136,  0.2289992 ,  0.21054109],
       [ 0.04326835,  0.01389136,  0.2289992 ,  0.21054109],
       [ 0.05064224, -0.01562289,  0.24124992,  0.21043073],
       [ 0.05160724,  0.00555247,  0.2289992 ,  0.21054109],
       [ 0.04326835,  0.01389136,  0.2289992 ,  0.21054109],
       [ 0.03411835,  0.02304136,  0.2289992 ,  0.21054109],
       [ 0.05049107, -0.00558208,  0.24124992,  0.21054109],
       [ 0.03411835,  0.02304136,  0.2289992 ,  0.21054109],
       [ 0.05160724,  0.00555247,  0.2289992 ,  0.21054109],
       [ 0.03661835,  0.02054136,  0.2289992 ,  0.21054109],
       [ 0.05071835,  0.00644136,  0.2289992 ,  0.21054109],
       [-0.04189352,  0.00151293, -0.2317868 , -0.23113261],
       [-0.04189352,  0.00151293, -0.2317868 , -0.23113261],
       [-0.03712079, -0.00272408, -0.23232252, -0.23113261],
       [-0.03220356, -0.01615747, -0.22453609, -0.23040288],
       [-0.03712079, -0.01051051, -0.22453609, -0.23113261],
       [-0.03492079, -0.01271051, -0.22453609, -0.23113261],
       [-0.04189352,  0.00151293, -0.2317868 , -0.23113261],
       [ 0.03065335, -0.021934  , -0.23915008, -0.27286927],
       [-0.03712079, -0.01051051, -0.22453609, -0.23113261],
       [ 0.03234965, -0.01102659, -0.23915008, -0.27547297],
       [ 0.03065335, -0.021934  , -0.23915008, -0.27286927],
       [-0.03492079, -0.01271051, -0.22453609, -0.23113261],
       [-0.03282023, -0.01677414, -0.22453609, -0.22916955],
       [-0.03712079, -0.01051051, -0.22453609, -0.23113261],
       [-0.03492079, -0.01271051, -0.22453609, -0.23113261],
       [-0.03712079, -0.00272408, -0.23232252, -0.23113261],
       [-0.03492079, -0.01271051, -0.22453609, -0.23113261],
       [-0.03537023, -0.01422414, -0.22453609, -0.22916955],
       [-0.03440356, -0.01395747, -0.22453609, -0.23040288],
       [-0.03220356, -0.01615747, -0.22453609, -0.23040288],
       [-0.03469352,  0.00431293, -0.2317868 , -0.23113261],
       [-0.03712079, -0.01051051, -0.22453609, -0.23113261],
       [-0.03440356, -0.01395747, -0.22453609, -0.23040288],
       [-0.03712079, -0.01051051, -0.22453609, -0.23113261],
       [-0.03712079, -0.01051051, -0.22453609, -0.23113261],
       [-0.03712079, -0.01051051, -0.22453609, -0.23113261],
       [-0.03712079, -0.01051051, -0.22453609, -0.23113261],
       [-0.03712079, -0.01051051, -0.22453609, -0.23113261],
       [-0.03492079, -0.01271051, -0.22453609, -0.23113261],
       [-0.03282023, -0.01677414, -0.22453609, -0.22916955],
       [-0.03220356, -0.01615747, -0.22453609, -0.23040288],
       [-0.03282023, -0.01677414, -0.22453609, -0.22916955],
       [-0.03475356, -0.01360747, -0.22453609, -0.23040288],
       [-0.03475356, -0.01360747, -0.22453609, -0.23040288],
       [ 0.02637067, -0.00935907, -0.23552475, -0.27478686],
       [-0.03214352,  0.01176293, -0.2317868 , -0.23113261],
       [-0.03712079, -0.00272408, -0.23232252, -0.23113261],
       [-0.03440356, -0.01395747, -0.22453609, -0.23040288],
       [-0.03492079, -0.01271051, -0.22453609, -0.23113261],
       [-0.03220356, -0.01615747, -0.22453609, -0.23040288],
       [-0.03220356, -0.01615747, -0.22453609, -0.23040288],
       [-0.03712079, -0.01051051, -0.22453609, -0.23113261],
       [-0.03220356, -0.01615747, -0.22453609, -0.23040288],
       [ 0.03065335, -0.021934  , -0.23915008, -0.27286927],
       [-0.03475356, -0.01360747, -0.22453609, -0.23040288],
       [-0.03492079, -0.01271051, -0.22453609, -0.23113261],
       [-0.03492079, -0.01271051, -0.22453609, -0.23113261],
       [-0.03712079, -0.01051051, -0.22453609, -0.23113261],
       [ 0.02989965, -0.02302659, -0.23470008, -0.27547297],
       [-0.03492079, -0.01271051, -0.22453609, -0.23113261]]), array([[-0.04326835, -0.01389136, -0.2289992 , -0.21054109],
       [-0.05049107,  0.00558208, -0.24124992, -0.21054109],
       [-0.05160724, -0.00555247, -0.2289992 , -0.21054109],
       [-0.05137996, -0.00131546, -0.23346349, -0.21054109],
       [-0.04326835, -0.01389136, -0.2289992 , -0.21054109],
       [-0.02668498, -0.0271903 , -0.23262454, -0.21020018],
       [-0.04915724, -0.00800247, -0.2289992 , -0.21054109],
       [-0.04826835, -0.00889136, -0.2289992 , -0.21054109],
       [-0.05137996,  0.00647097, -0.24124992, -0.21054109],
       [-0.05049107, -0.00220435, -0.23346349, -0.21054109],
       [-0.02918498, -0.0246903 , -0.23262454, -0.21020018],
       [-0.04826835, -0.00889136, -0.2289992 , -0.21054109],
       [-0.05049107,  0.00558208, -0.24124992, -0.21054109],
       [-0.05137996,  0.00647097, -0.24124992, -0.21054109],
       [ 0.03431018, -0.05324627, -0.2486132 , -0.21915072],
       [ 0.03431018, -0.05324627, -0.2486132 , -0.21915072],
       [-0.02668498, -0.0271903 , -0.23262454, -0.21020018],
       [-0.04326835, -0.01389136, -0.2289992 , -0.21054109],
       [ 0.03431018, -0.05324627, -0.2486132 , -0.21915072],
       [-0.03411835, -0.02304136, -0.2289992 , -0.21054109],
       [-0.03583498, -0.0080403 , -0.23262454, -0.21020018],
       [-0.03661835, -0.02054136, -0.2289992 , -0.21054109],
       [-0.04415724, -0.01300247, -0.2289992 , -0.21054109],
       [-0.05071835, -0.00644136, -0.2289992 , -0.21054109],
       [-0.04826835, -0.00889136, -0.2289992 , -0.21054109],
       [-0.05049107,  0.00558208, -0.24124992, -0.21054109],
       [-0.04826835, -0.00889136, -0.2289992 , -0.21054109],
       [-0.04326835, -0.01389136, -0.2289992 , -0.21054109],
       [-0.04826835, -0.00889136, -0.2289992 , -0.21054109],
       [-0.05160724, -0.00555247, -0.2289992 , -0.21054109],
       [-0.05049107, -0.00220435, -0.23346349, -0.21054109],
       [-0.03583498, -0.0080403 , -0.23262454, -0.21020018],
       [-0.03411835, -0.02304136, -0.2289992 , -0.21054109],
       [ 0.03431018, -0.05324627, -0.2486132 , -0.21915072],
       [-0.05049107, -0.00220435, -0.23346349, -0.21054109],
       [-0.05071835, -0.00644136, -0.2289992 , -0.21054109],
       [ 0.04516018, -0.02409627, -0.2486132 , -0.21915072],
       [-0.04326835, -0.01389136, -0.2289992 , -0.21054109],
       [-0.05137996,  0.00647097, -0.24124992, -0.21054109],
       [-0.04826835, -0.00889136, -0.2289992 , -0.21054109],
       [-0.04326835, -0.01389136, -0.2289992 , -0.21054109],
       [-0.05064224,  0.01562289, -0.24124992, -0.21043073],
       [-0.05160724, -0.00555247, -0.2289992 , -0.21054109],
       [-0.04326835, -0.01389136, -0.2289992 , -0.21054109],
       [-0.03411835, -0.02304136, -0.2289992 , -0.21054109],
       [-0.05049107,  0.00558208, -0.24124992, -0.21054109],
       [-0.03411835, -0.02304136, -0.2289992 , -0.21054109],
       [-0.05160724, -0.00555247, -0.2289992 , -0.21054109],
       [-0.03661835, -0.02054136, -0.2289992 , -0.21054109],
       [-0.05071835, -0.00644136, -0.2289992 , -0.21054109],
       [ 0.04189352, -0.00151293,  0.2317868 ,  0.23113261],
       [ 0.04189352, -0.00151293,  0.2317868 ,  0.23113261],
       [ 0.03712079,  0.00272408,  0.23232252,  0.23113261],
       [ 0.03220356,  0.01615747,  0.22453609,  0.23040288],
       [ 0.03712079,  0.01051051,  0.22453609,  0.23113261],
       [ 0.03492079,  0.01271051,  0.22453609,  0.23113261],
       [ 0.04189352, -0.00151293,  0.2317868 ,  0.23113261],
       [-0.03065335,  0.021934  ,  0.23915008,  0.27286927],
       [ 0.03712079,  0.01051051,  0.22453609,  0.23113261],
       [-0.03234965,  0.01102659,  0.23915008,  0.27547297],
       [-0.03065335,  0.021934  ,  0.23915008,  0.27286927],
       [ 0.03492079,  0.01271051,  0.22453609,  0.23113261],
       [ 0.03282023,  0.01677414,  0.22453609,  0.22916955],
       [ 0.03712079,  0.01051051,  0.22453609,  0.23113261],
       [ 0.03492079,  0.01271051,  0.22453609,  0.23113261],
       [ 0.03712079,  0.00272408,  0.23232252,  0.23113261],
       [ 0.03492079,  0.01271051,  0.22453609,  0.23113261],
       [ 0.03537023,  0.01422414,  0.22453609,  0.22916955],
       [ 0.03440356,  0.01395747,  0.22453609,  0.23040288],
       [ 0.03220356,  0.01615747,  0.22453609,  0.23040288],
       [ 0.03469352, -0.00431293,  0.2317868 ,  0.23113261],
       [ 0.03712079,  0.01051051,  0.22453609,  0.23113261],
       [ 0.03440356,  0.01395747,  0.22453609,  0.23040288],
       [ 0.03712079,  0.01051051,  0.22453609,  0.23113261],
       [ 0.03712079,  0.01051051,  0.22453609,  0.23113261],
       [ 0.03712079,  0.01051051,  0.22453609,  0.23113261],
       [ 0.03712079,  0.01051051,  0.22453609,  0.23113261],
       [ 0.03712079,  0.01051051,  0.22453609,  0.23113261],
       [ 0.03492079,  0.01271051,  0.22453609,  0.23113261],
       [ 0.03282023,  0.01677414,  0.22453609,  0.22916955],
       [ 0.03220356,  0.01615747,  0.22453609,  0.23040288],
       [ 0.03282023,  0.01677414,  0.22453609,  0.22916955],
       [ 0.03475356,  0.01360747,  0.22453609,  0.23040288],
       [ 0.03475356,  0.01360747,  0.22453609,  0.23040288],
       [-0.02637067,  0.00935907,  0.23552475,  0.27478686],
       [ 0.03214352, -0.01176293,  0.2317868 ,  0.23113261],
       [ 0.03712079,  0.00272408,  0.23232252,  0.23113261],
       [ 0.03440356,  0.01395747,  0.22453609,  0.23040288],
       [ 0.03492079,  0.01271051,  0.22453609,  0.23113261],
       [ 0.03220356,  0.01615747,  0.22453609,  0.23040288],
       [ 0.03220356,  0.01615747,  0.22453609,  0.23040288],
       [ 0.03712079,  0.01051051,  0.22453609,  0.23113261],
       [ 0.03220356,  0.01615747,  0.22453609,  0.23040288],
       [-0.03065335,  0.021934  ,  0.23915008,  0.27286927],
       [ 0.03475356,  0.01360747,  0.22453609,  0.23040288],
       [ 0.03492079,  0.01271051,  0.22453609,  0.23113261],
       [ 0.03492079,  0.01271051,  0.22453609,  0.23113261],
       [ 0.03712079,  0.01051051,  0.22453609,  0.23113261],
       [-0.02989965,  0.02302659,  0.23470008,  0.27547297],
       [ 0.03492079,  0.01271051,  0.22453609,  0.23113261]])]
注意:如果变量的特征较多、或数据较大时,生成shap值非常慢。

4   可视化shap值接着可视化shap值,代码如下:
# 可视化SHAP值  
shap.summary_plot(shap_values[0], X, plot_type="bar")
得到结果:

代码语言:javascript复制

可以发现petal_length变量的shap值是最大的,即该变量对因变量y的影响最大。其次是petal_width变量。


5   对比模型特征重要性由于变量较多时,计算shap值比较慢。而模型特征重要性和shap值的效果是差不多的,都是评价特征对因变量y的重要程度。如果对比shap值和模型特征重要性的结果是类似的,可以用特征重要性来代替计算shap值。可视化特征重要性,代码如下:
features_import = pd.DataFrame({'feature':X.columns, 'importance':model.feature_importances_})  #构建特征重要性数据框
# 绘图
from matplotlib import pyplot as plt

plt.rcParams['font.sans-serif'] = ['SimHei']  # 显示中文黑体
plt.barh(features_import['feature'], features_import['importance'], height=0.7, color='blue', edgecolor='#005344') # 更多颜色可参见颜色大全
plt.xlabel('feature importance')              # x 轴
plt.ylabel('features')                        # y轴
plt.title('Feature Importances')              # 标题
for a,b in zip( features_import['importance'],features_import['feature']): # 添加数字标签
    print(a,b)
    plt.text(a 0.001, b,'%.3f'%float(a)) # a 0.001代表标签位置在柱形图上方0.001处
plt.show()
得到结果:

可以发现变量重要性排序和shap值是一致的,petal_length变量的特征重要性值是最大的,其次是petal_width变量。

至此,风控建模中的shap值可视化已讲解完毕,如想了解更多建模内容,可以翻看公众号中“风控建模”模块相关文章。

0 人点赞