This chapter will cover the following topics:这章将包含如下主题:
1、Using KMeans to cluster data 使用kmeans(k均值)分类数据
2、Optimizing the number of centroids 最优化形心数
3、 Assessing cluster correctness评估分类的正确性
4、 Using MiniBatch KMeans to handle more data 使用MiniBatch KMeans(小批量k均值)来处理更多数据
5、Quantizing an image with KMeans clustering 使用聚类KMeans量化图形
6、Finding the closest objects in the feature space 在向量空间寻找最近的对象
7、Probabilistic clustering with Gaussian Mixture Models 基于概率的聚类高斯混合模型Gaussian Mixture Models
8、Using KMeans for outlier detection 使用KMeans来寻找离群值
9、Using k-NN for regression 使用K-NN回归
Introduction简介
In this chapter, we'll cover clustering. Clustering is often grouped together with unsupervised techniques. These techniques assume that we do not know the outcome variable. This leads to ambiguity in outcomes and objectives in practice, but nevertheless, clustering can be useful.
在本章,我们将覆盖聚类分析,聚类分析是无监督的分类技术。该技术假定我们不知道输出变量的情况,这将在输出和结果之间导致歧义,但是聚类将会很有用。
As we'll see, we can use clustering to "localize" our estimates in a supervised setting. This is perhaps why clustering is so effective; it can handle a wide range of situations, and often,the results are for the lack of a better term, "sane".
如我所见,我们能使用聚类定位我们的监督学习的位置,这就是为什么聚类分析这么有效。它可以处理很广泛的情形,它的结果是少量的较好的项。“明智的”
We'll walk through a wide variety of applications in this chapter; from image processing to regression and outlier detection. Through these applications, we'll see that clustering can often be viewed through a probabilistic or optimization lens. Different interpretations lead to various trade-offs. We'll walk through how to fit the models here so that you have the tools to try out many models when faced with a clustering problem.
本章,我们将了解广泛变量的应用。从图形处理回归问题和寻找离群值,通过这些应用,我们将看到聚类方法能通过基于概率的或者最优化lens,不同解导致多方面的调整。我们通过如何拟合模型来帮助你,当遇到聚类问题你可以有足够的工具来尝试不同的模型。