Now it's time to take the math up a level! Principal component analysis (PCA) is the first somewhat advanced technique discussed in this book. While everything ...
Pipelines are (at least to me) something I don't think about using often, but are useful.They can be used to tie together many steps into one object. This allow...
Categorical variables are a problem. On one hand they provide valuable information; on the other hand, it's probably text—either the actual text or integers cor...
In the last recipe, we looked at transforming our data into the standard normal distribution.Now, we'll talk about another transformation, one that is quite dif...
A preprocessing step that is almost recommended is to scale columns to the standard normal. The standard normal is probably the most important distribution of a...
I will again implore you to use some of your own data for this book, but in the event you cannot,we'll learn how we can use scikit-learn to create toy data.
This chapter discusses setting data, preparing data, and premodel dimensionality reduction.These are not the
Python 是一个很棒的语言。它是世界上发展最快的编程语言之一。它一次又一次地证明了在开发人员职位中和跨行业的数据科学职位中的实用性。整个 Python 及其库的生态系统使它成为全世界用户(初学者和高级用户)的合适选...
本文我们使用加州住房价格数据集,从零开始,一步一步建立模型,预测每个区域的房价中位数。目的是完整实现一个机器学习的流程。
本文简要介绍了机器学习中拓扑数据分析的力量并展示如何配合三个Python库:Gudhi,Scikit-Learn和Tensorflow进行实践。