In this recipe, we'll look at multiclass classification. Depending on your choice of algorithm,you either get multiclass classification for free, or you have to define a scheme for comparison.
在这部分,我们学习多分类问题,根据你算法的选择,你既可以自由的得到一个多分类算法,或者你得定义一个比较的模型。
Getting ready准备工作
When working with linear models such as logistic regression, we need to use OneVsRestClassifier . This scheme will create a classifier for each class.
当使用例如logistic回归的线性模型,我们需要使用OneVsRestClassifier,这个方案将要给每一个类生成一个分类器。
How to do it…怎么做
First, we'll walk through a cursory example of a Decision Tree fitting a multiclass dataset.Like we discussed earlier, we get multiclass for free with some classifiers, so we'll just fit the example to prove that it works, and move on.
首先,我们通过一个粗略的决策树模型的例子来拟合一个多分类数据集,像我们之前讨论的一样,我们将很自由的使用几个分类器来实现多分类,所以我们只要拟合这个例子来证明它能工作就行,然后继续。
Second, we'll actually incorporate OneVsRestClassifier into our model:
其次,我们实实在在的应用OneVsRestClassifier到我们的模型。
代码语言:javascript复制from sklearn import datasets
X, y = datasets.make_classification(n_samples=10000, n_classes=3,n_informative=3)
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier()
dt.fit(X, y)
dt.predict(X)
array([1, 1, 0, .., 2, 1, 1])
As you can see, we were able to fit a classifier with minimum effort.Now, let's move on to the case of the multiclass classifier. This will require us to import OneVsRestClassifier . We'll also import LogisticRegression while we're at it:
如我们所见,我们能够使用最小的影响来拟合分类器,现在,让我们继续多分类的分类器的例子。需要我们导入OneVsRestClassifier,同时也需要导入我们刚说的LogisticRegression。
代码语言:javascript复制from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import LogisticRegression
Now, we'll override the LogisticRegression classifier. Also, notice that we can parallelize this. If we think about how OneVsRestClassifier works, it's just training separate models and then comparing them. So, we can train the data separately at the same time:
现在,我们要推翻LogisticRegression分类器,同时,注意我们能近似它。如果我们考虑OneVsRestClassifier如何运行,它就是训练分类模型,然后比较他们。所以我们能同时单独训练数据。
代码语言:javascript复制mlr = OneVsRestClassifier(LogisticRegression(), n_jobs=2)
mlr.fit(X, y)
mlr.predict(X)
array([1, 1, 0, ..., 2, 1, 1])
How it works…怎么运行的
If we want to quickly create our own OneVsRestClassifier , how might we do it?First, we need to construct a way to iterate through the classes and train a classifier for each classifier. Then, we need to predict each class first:
如果我们想快速生成我们自己的OneVsRestClassifier,我们能怎么做呢?首先,我们需要通过类别来构建一个方法来迭代并且为每一个类训练一个分类器,然后我们需要预测每个类。
代码语言:javascript复制import numpy as np
def train_one_vs_rest(y, class_label):
y_train = (y == class_label).astype(int)
return y_train
classifiers = []
for class_i in sorted(np.unique(y)):
l = LogisticRegression()
y_train = train_one_vs_rest(y, class_i)
l.fit(X, y_train)
classifiers.append(l)
Ok, so now that we have a one versus rest scheme set up, all we need to do is evaluate the data point's likelihood for each classifier. We will then assign the classifier to the data point with the largest likelihood.
好了,所以现在我们已经设置好一个一对多的结构,所有要做的是就是评估数据点与分类器的相似性。我们将用数据点的最大相似性来标记分类。
For example, let's predict X[0] :例如,让我们来预测X[0]:
代码语言:javascript复制for classifier in classifiers
print classifier.predict_proba([X[0]])
[[ 0.90443776 0.09556224]]
[[ 0.03701073 0.96298927]]
[[ 0.98492829 0.01507171]]
As you can see, the second classifier (the one in index 1 ) has the highest likelihood of being "positive", therefore we'll assign 1 to this point.
如我们所见,第二个分类器(索引为1)有更高的赞同的相似性,因此,我们给这个点标注1.