Truncated Singular Value Decomposition (SVD) is a matrix factorization technique that factors a matrix M into the three matrices U, Σ, and V. This is very similar to PCA, excepting that the factorization for SVD is done on the data matrix, whereas for PCA, the factorization is done on the covariance matrix. Typically, SVD is used under the hood to find the principle components of a matrix.
Getting ready准备
Truncated SVD is different from regular SVDs in that it produces a factorization where the number of columns is equal to the specified truncation. For example, given an n x n matrix,SVD will produce matrices with n columns, whereas truncated SVD will produce matrices with the specified number of columns. This is how the dimensionality is reduced.
Here, we'll again use the iris dataset so that you can compare this outcome against the PCA outcome:现在我们再次使用iris数据集,以便我们能将输出与PCA输出作比较:
代码语言:javascript复制from sklearn.datasets import load_iris
iris = load_iris()
iris_data =
iris_target =
How to do it...如何做
This object follows the same form as the other objects we've used. First, we'll import the required object, then we'll fit the model and examine the results:
代码语言:javascript复制from sklearn.decomposition import TruncatedSVD
svd = TruncatedSVD(2)
iris_transformed = svd.fit_transform(iris_data)
array([[ 5.1, 3.5, 1.4, 0.2],
[ 4.9, 3. , 1.4, 0.2],
[ 4.7, 3.2, 1.3, 0.2],
[ 4.6, 3.1, 1.5, 0.2],
[ 5. , 3.6, 1.4, 0.2]])
array([[ 5.91220352, -2.30344211],
[ 5.57207573, -1.97383104],
[ 5.4464847 , -2.09653267],
[ 5.43601924, -1.87168085],
[ 5.87506555, -2.32934799]])
The output will look like the following:输出如下图所示:
How it works...如何做的
Now that we've walked through how TruncatedSVD is performed in scikit-learn, let's look at how we can use only scipy , and learn a bit in the process.First, we need to use linalg of scipy to perform SVD:
代码语言:javascript复制from scipy.linalg import svd
import numpy as np
D = np.array([[1, 2], [1, 3], [1, 4]])
array([[1, 2],
[1, 3],
[1, 4]])
U, S, V = svd(D, full_matrices=False)
U.shape, S.shape, V.shape
((3, 2), (2,), (2, 2))
We can reconstruct the original matrix D to confirm U, S, and V as a decomposition:我们能重构原矩阵D来验证U、S、V是他的解。
代码语言:javascript复制, V) # np.diag() 返回对角线元素
array([[1, 2],
[1, 3],
[1, 4]])
The matrix that is actually returned by TruncatedSVD is the dot product of the U andS matrices.
If we want to simulate the truncation, we will drop the smallest singular values and the corresponding column vectors of U. So, if we want a single component here,we do the following:
代码语言:javascript复制new_S = S[0]
new_U = U[:, 0]
array([-2.20719466, -3.16170819, -4.11622173])
In general, if we want to truncate to some dimensionality, for example, t, we drop N-t singular values.
There's more...扩展阅读
TruncatedSVD has a few miscellaneous things that are worth noting with respect to the method.
Sign flipping混淆符号
There's a "gotcha" with truncated SVDs. Depending on the state of the random number generator, successive fittings of TruncatedSVD can flip the signs of the output. In order to avoid this, it's advisable to fit TruncatedSVD once, and then use transforms from then on.
Another good reason for Pipelines!用Pipelines的另一个原因
To carry this out, do the following:为了实现这个,这样做:
代码语言:javascript复制tsvd = TruncatedSVD(2)
Sparse matrices稀疏矩阵
One advantage of TruncatedSVD over PCA is that TruncatedSVD can operate on sparse matrices while PCA cannot. This is due to the fact that the covariance matrix must be computed for PCA, which requires operating on the entire matrix.