Working with QDA – a nonlinear LDA使用QDA-非线性的LDA

2020-05-06 11:43:19 浏览数 (1)

QDA is the generalization of a common technique such as quadratic regression. It is simply a generalization of the model to allow for more complex models to fit, though, like all things,when allowing complexity to creep in, we make our life more difficult.

QDA是一种一般化的普遍技术,如二次回归。它是用一种简单的一般化模型来考虑拟合更复杂的模型,正如所有事情一样,当复杂的问题出现,我们使得我们的生活更加艰难。

Getting ready准备工作

We will expand on the last recipe and look at Quadratic Discernment Analysis (QDA) via the QDA object.We said we made an assumption about the covariance of the model. Here, we will relax the assumption.

我们在前一部分的基础上扩展并且通过QDA对象看一看二次判别分析QDA,我们说过我们做了一个关于模型协方差的假设,现在我们放宽假设。

How to do it...怎么做

QDA is aptly a member of the qda module. Use the following commands to use QDA:

QDA是QDA模型里的一个适当的成员,使用以下代码来使用QDA:

代码语言:javascript复制
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis as QDA
qda = QDA()
qda.fit(X.ix[:, :-1], X.ix[:, -1])
predictions = qda.predict(X.ix[:, :-1])
predictions.sum()
2812.0
from sklearn.metrics import classification_report
print classification_report(X.ix[:, -1].values, predictions)

              precision    recall  f1-score   support

         0.0       0.69      0.22      0.33      3083
         1.0       0.41      0.84      0.55      1953

    accuracy                           0.46      5036
   macro avg       0.55      0.53      0.44      5036
weighted avg       0.58      0.46      0.42      5036

As you can see, it's about equal on the whole. If we look back at the LDA recipe, we can see large changes as opposed to the QDA object for class 0 and minor differences for class 1.

如你所见,整体上是等同的,如果我们看一下上一部分的LDA,我们能看到很大不同与QDA对象截然不同的0分类和很小不同的1分类。

How it works…如何运行的

Like we talked about in the last recipe, we essentially compare likelihoods here. So, how do we compare likelihoods? Let's just use the price at hand to attempt to classify is_higher.We'll assume that the closing price is log-normally distributed. In order to compute the likelihood for each class, we need to create the subsets of closes as well as a training and test set for each class. As a sneak peak to the next chapter, we'll use the built-in cross validation methods:

如我们在前一部分讨论的那样,我们本质上是对比其相似性。所以,如何对比相似性?我们只要使用手边的价格用于分类的is_higher.我们假设最近的价格log-normally分布。为了计算每个类的相似性,我们需要为每一个分类的训练集和测试集生成近似的分组,作为对下一章的预览,我们使用內建的交叉验证方法。

代码语言:javascript复制
from sklearn.model_selection import train_test_split
import scipy.stats as sp
test, train = train_test_split(X)
train_close = train.Close

train_0 = train_close[~train.is_higher.astype(bool)]
train_1 = train_close[train.is_higher.astype(bool)]
test_close = test.Close.values

ll_0 = sp.norm.pdf(test_close, train_0.mean())
ll_1 = sp.norm.pdf(test_close, train_1.mean())

Now that we have likelihoods for both classes, we can compare and assign classes:

现在我们有了每个类之间的相似性,让我们对比并分类

代码语言:javascript复制
(ll_0 > ll_1).mean()
0.18374371194069367

0 人点赞