本库是实现用于决策树学习的 ID3 算法的 Ruby 库,目前能够学习连续和离散的数据集。
Github 链接:
https://github.com/igrigorik/decisiontree
特点
- 用于连续和离散情况的 ID3 算法,支持不一致的数据集。
- Graphviz 组件可视化学习树
- 支持多个符号输出和连续树形图。
- 当没有分支适合输入时返回默认值
实现
Ruleset 是一个用 2/3 训练数据训练 ID3Tree 的类,并将其转换为一组规则,然后用剩下的 1/3 数据(以 C4.5 的方式,https://en.wikipedia.org/wiki/C4.5_algorithm)修剪规则。
Bagging 是一个基于 Bagging 的训练器,它可以训练 10 个 Ruleset 训练器,并通过投票预测最佳的输出结果。
详细信息请访问以下链接:
https://www.igvita.com/2007/04/16/decision-tree-learning-in-ruby/
示例
代码语言:javascript复制require 'decisiontree'
attributes = ['Temperature']
training = [
[36.6, 'healthy'],
[37, 'sick'],
[38, 'sick'],
[36.7, 'healthy'],
[40, 'sick'],
[50, 'really sick'],
]
# Instantiate the tree, and train it based on the data (set default to '1')
dec_tree = DecisionTree::ID3Tree.new(attributes, training, 'sick', :continuous)
dec_tree.train
test = [37, 'sick']
decision = dec_tree.predict(test)
puts "Predicted: #{decision} ... True decision: #{test.last}"
# => Predicted: sick ... True decision: sick
# Specify type ("discrete" or "continuous") in the training data
labels = ["hunger", "color"]
training = [
[8, "red", "angry"],
[6, "red", "angry"],
[7, "red", "angry"],
[7, "blue", "not angry"],
[2, "red", "not angry"],
[3, "blue", "not angry"],
[2, "blue", "not angry"],
[1, "red", "not angry"]
]
dec_tree = DecisionTree::ID3Tree.new(labels, training, "not angry", color: :discrete, hunger: :continuous)
dec_tree.train
test = [7, "red", "angry"]
decision = dec_tree.predict(test)
puts "Predicted: #{decision} ... True decision: #{test.last}"
# => Predicted: angry ... True decision: angry