丰富的机器学习工具
当谈到训练计算机在没有明确编程的情况下采取行动时,存在大量来自机器学习领域的工具。学术界和行业专业人士使用这些工具在MRI扫描中构建从语音识别到癌症检测的多种应用。这些工具可在网上免费获得。如果您感兴趣,我已经编制了这些的排名(请参阅本页底部)以及一些区分它们的重要功能的概述。其中,从主页网站获取每种工具的描述,关注机器学习中的特定范例以及学术界和工业界的一些显着用途。
研究人员可以一次使用许多不同的库,编写自己的库,或者不引用任何特定的工具,因此很难量化每种库的相对采用。相反,搜索排名反映了5月份谷歌搜索每个工具的相对大小。该分数并不反映广泛采用,但为我们提供了一个很好的指示,表明正在使用哪些。注意*像“Caffe”这样的模糊名称被评为“Caffe机器学习”,不那么含糊。
机器学习工具总览
我已经将两个机器学习子领域Deep和Shallow Learning区分开来,这已成为过去几年中的一个重要分支。深度学习负责图像分类和语音识别的记录结果,因此由Google,Facebook和百度等大型数据公司牵头。相反,浅层学习方法包括各种不太前沿的分类,聚类和提升技术,如支持向量机。浅层学习方法仍然广泛应用于自然语言处理,脑计算机接口和信息检索等领域。
机器学习包和库的详细比较
此表还包含有关使用GPU的特定工具支持的信息。GPU接口已经成为机器学习工具的一个重要特性,因为它可以加速大规模矩阵运算。这对深度学习方法的重要性是显而易见的。例如,在2015年5月初的GPU技术大会上,机器学习下的45个演讲中有39个是关于GPU加速的深度学习应用程序,这些应用程序来自31家主要的科技公司和8所大学。这一吸引力反映了Deep Networks对GPU辅助培训的巨大速度提升,因此是一项重要功能。
还提供了有关通过Hadoop或Spark在集群中分配计算的工具能力的信息。这已成为适合分布式计算的浅学习技术的重要论述点。同样,Deep Networks的分布式计算也成为一个讨论点,因为已经为分布式训练算法开发了新技术。
最后,附上一些关于学术界和工业界对这些工具的不同使用的补充说明。通过搜索机器学习出版物,演示文稿和分布式代码收集了哪些信息。
这项研究的结果表明,目前有许多工具正在使用,目前还不确定哪种工具能够赢得狮子会在工业界或学术界的使用份额。
Search Rank | Tool | Language | Type | Description“quote” | Use | GPU acceleration | Distributed computing | |
---|---|---|---|---|---|---|---|---|
100 | Theano | Python | Library | umerical computation library for multi-dimensional arrays efficiently | Deep and shallow Learning | CUDA and Open CL | cuDNN Cutorch | |
78 | Torch 7 | Lua | Framework | Scientific computing framework with wide support for machine learning algorithms | Deep and shallow Learning | CUDA and Open CL, cuDNN | Cutorch | |
64 | R | R | Environment/ Language | Functional language and environment for statistics | Shallow Learning | RPUD | RPUD | HiPLAR |
RPUD | ||||||||
52 | LIBSVM | Java and C | Library | A Library for Support Vector Machines | Support Vector Machines | CUDA | Not Yet | |
34 | scikit-learn | Python | Library | Machine Learning in Python | Shallow Learning | Not Yet | Not Yet | |
28 | SparkMLLIB | C , APIs in JAVA, and Python | Library/API | Apache Spark’s scalable machine learning library | Shallow Learning | ScalaCL | Spark andHadoop | |
24 | Matlab | Matlab | Environment/ Language | High-level technical computing language and interactive environment for algorithm development, data visualization, data analysis, and numerical analysis | Deep and Shallow Learning | Parallel Computing Toolbox (not-free not-open source) | Distributed ComputingPackage (not-free not-open source) | |
18 | Pylearn2 | Python | Library | Machine Learning | Deep Learning | CUDA and OpenCL, cuDNN | Not Yet | |
14 | VowPalWabbit | C | Library | Out-of-core learning system | Shallow Learning | CUDA | Not Yet | |
13 | Caffe | C | Framework | Deep learning framework made with expression, speed, and modularity in mind | Deep Learning | CUDA and OpenCL, cuDNN | Not Yet | |
11 | LIBLINEAR | Java and C | Library | A Library for Large Linear Classification | Support Vector Machines and Logistic Regression | CUDA | Not Yet | |
6 | Mahout | Java | Environment/ Framework | An environment for building scalable algorithms | Shallow Learning | JCUDA | Spark andHadoop | |
5 | Accord.NET | .Net | Framework | Machine learning | Deep and Shallow Learning | CUDA.net | Not Yet | |
5 | NLTK | Python | Library | Programs to work with human language data | Text Classification | Skits.cuda | Not Yet | |
4 | Deeplearning4j | Java | Framework | Commercial-grade, open-source, distributed deep-learning library | Deep and shallow Learning | JClubas | Spark andHadoop | |
4 | Weka 3 | Java | Library | Collection of machine learning algorithms for data mining tasks | Shallow Learning | Not Yet | DistributedWeka Spark | |
4 | MLPY | Python | Library | Machine Learning | Shallow Learning | Skits.cuda | Not Yet | |
3 | Pandas | Python | Library | Data analysis and manipulation | Shallow Learning | Skits.cuda | Not Yet | |
1 | H20 | Java, Python and R | Environment/ Language | open source predictive analytics platform | Deep and Shallow Learning | Not Yet | Spark and Hadoop | |
0 | Cuda-covnet | C | Library | machine learning library forneural-network applications | Deep Neural Networks | CUDA | coming in Cuda-covnet2 | |
0 | Mallet | Java | Library | Package for statistical natural language processing | Shallow Learning | JCUDA | Spark and Hadoop | |
0 | JSAT | Java | Library | Statistical Analysis Tool | Shallow Learning | JCUDA | Spark and Hadoop | |
0 | MultiBoost | C | Library | Machine Learning | Boosting Algorithms | CUDA | Not Yet | |
0 | Shogun | C | Library | Machine Learning | Shallow Learning | CUDA | Not Yet | |
0 | MLPACK | C | Library | Machine Learning | Shallow Learning | CUDA | Not Yet | |
0 | DLIB | C | Library | Machine Learning | Shallow Learning | CUDA | Not Yet | |
0 | Ramp | Python | Library | Machine Learning | Shallow Learning | Skits.cuda | Not Yet | |
0 | Deepnet | Python | Library | GPU-based Machine Learning | Deep Learning | CUDA | Not Yet | |
0 | CUV | Python | Library | GPU-based Machine Learning | Deep Learning | CUDA | Not Yet | |
0 | APRIL-ANN | Lua | Library | Machine Learning | Deep Learning | Not Yet | Not Yet | |
0 | nnForge | C | Framework | GPU-basedMachine Learning | Convolutionl and fully-connected neural networks | CUDA | Not Yet | |
0 | PYML | Python | Framework | Object oriented framework for machine learning | SVMs and other kernel methods | Skits.cuda | Not Yet | |
0 | Milk | Python | Library | Machine Learning | Shallow Learning | Skits.cuda | Not Yet | |
0 | MDP | Python | Library | Machine Learning | Shallow Learning | Skits.cuda | Not Yet | |
0 | Orange | Orange | Python | Library | Machine Learning | Shallow Learning | Skits.cuda | Not Yet |
Orange | ||||||||
0 | PYMVPA | Python | Library | Machine Learning | Only Classification | Skits.cuda | Not Yet | |
0 | Monte | Python | Library | Machine Learning | Shallow Learning | Skits.cuda | Not Yet | |
0 | RPY2 | Python to R | API | Low-level interface to R | Shallow Learning | Skits.cuda | Not Yet | |
0 | NueroLab | Python | Library | Machine Learning | Feed Forward Neural Networks | Skits.cuda | Not Yet | |
0 | PythonXX | Python | Library | Machine Learning | Shallow Learning | Skits.cuda | Not Yet | |
0 | Hcluster | Python | Library | Machine Learning | Clustering Algorithms | Skits.cuda | Not Yet | |
0 | FYANN | C | Library | Machine Learning | Feed Forward Neural Networks | Not Yet | Not Yet | |
0 | PyANN | Python | Library | Machine Learning | Nearest Neighbours Classification | Not Yet | Not Yet | |
0 | FFNET | Python | Library | Machine Learning | FeedForwad NeuralNetwors | Not Yet | Not Yet |