本篇文章整理一下decision tree learning的知识点。
下面是维基百科的定义:
Decision tree learning uses a decision tree (as a predictive model) to go from observations about an item (represented in the branches) to conclusions about the item's target value (represented in the leaves).
决策树主要分两类:
- Classification tree analysis is when the predicted outcome is the class to which the data belongs.
- Regression tree analysis is when the predicted outcome can be considered a real number (e.g. the price of a house, or a patient's length of stay in a hospital).
Decision tree learning is the construction of a decision tree from class-labeled training tuples. A decision tree is a flow-chart-like structure, where each internal (non-leaf) node denotes a test on an attribute, each branch represents the outcome of a test, and each leaf (or terminal) node holds a class label. The topmost node in a tree is the root node.
关于决策树有许多的算法,下面是一些经典的算法:
- ID3 (Iterative Dichotomiser 3)
- C4.5 (successor of ID3)
- CART (Classification And Regression Tree)
- CHAID (CHi-squared Automatic Interaction Detector). Performs multi-level splits when computing classification trees.
- MARS: extends decision trees to handle numerical data better.
- Conditional Inference Trees. Statistics-based approach that uses non-parametric tests as splitting criteria, corrected for multiple testing to avoid overfitting. This approach results in unbiased predictor selection and does not require pruning.
熵(Entropy)
度量信息的不确定性;以比特(bits)为单位;完全确定的分类,其熵为0比特;其公式为
其中,nn是样本的数量,P(xi)P(xi)是第ii个样本的概率;bb一般取2或ee或10;前面加符号是为了熵为非负数;
信息增益(information gain)
父结点熵 H(T)H(T)与子节点熵加权均值的差;应该选择信息增益最大的解释变量进行结点划分;
其中,xa∈val(s)xa∈val(s)表示解释变量aa的样本xx;{x∈T|xa=v}{x∈T|xa=v} 表示解释变量aa的值等于vv样本数量;H(x∈T|xa=v)H(x∈T|xa=v)是解释变量a的值等于vv的样本熵;
基尼不纯度(gini impurity)
在使用CART方法时,按照集合中子集标签的概率分布对集合中元素指定标签,基尼不纯度用来衡量被错误指定标签的元素被随机抽取的概率;
计算公式:假设集合共有jj个子集,tt是结点样本的子集,其中P(i|t)P(i|t)表示从结点子集中选择一个类型ii的概率;
一个栗子:
由上面的公式我们可以看出其实信息熵就是信息的期望值,所以我们可知,信息熵越越小,信息的纯度越高,也就是信息越少,在分类领域来讲就是里面包含的类别越少,所以我们可以得出,与初始信息熵的差越大分类效果越好。
我一般买苹果的时候,从外观上评判一个苹果甜不甜有两个依据:红不红 和 圆不圆。(数据如下)
计算5个苹果是好苹果的信息熵:(结果为0.970954)
按红不红和圆不圆来分类,分别计算信息熵及其信息增益:
红不红:信息熵是0.5509775,信息增益是0.419973
圆不圆:信息熵是0.8,信息增益是0.17095
显然第一种分类的信息增益较大,可以分别看下划分的结果集:
决策树构建的基本步骤:
1>> 开始,所有记录看做一个结点;
2>> 遍历每个变量的每一种分割方式(如信息增益最大、基尼不纯度差最大),找到最好的分割点;
3>> 分割成两个结点N1N1和N2N2
4>> 对N1N1和N2N2分别继续执行2-3步,直到每个结点足够纯为止;
参考文献:
1)维基百科 https://en.wikipedia.org/wiki/Decision_tree_learning
2)<机器学习笔记-05 ><scikit-learn 05>决策树 & 随机森林
https://blog.csdn.net/qq_25040013/article/details/52565414
3)机器学习笔记之信息熵、信息增益和决策树(ID3算法)
https://blog.csdn.net/zhurui_idea/article/details/54646932
4)Python3《机器学习实战》学习笔记(二):决策树基础篇之让我们从相亲说起
https://blog.csdn.net/c406495762/article/details/75663451
5)决策树算法及python实现
https://blog.csdn.net/huahuazhu/article/details/73167610?locationNum=2&fps=1
6)Scikit-learn中的决策树
http://python.jobbole.com/86911/
代码语言:javascript复制—End—