TensorFlow入门 - 使用TensorFlow给鸢尾花分类(线性模型)

2018-04-27 17:16:36 浏览数 (1)

TensorFlow入门 - 使用TensorFlow给鸢尾花分类(线性模型)

本例参考自Plain and Simple Estimators - YouTube,中文字幕以及详细解释参考机器学习 | 更进一步,用评估器给花卉分类,本文着重于其具体实现部分,给代码加了比较详细的注释。 本例是作者毕业设计的一部分,因此保证绝对正确和有效,谢绝一切形式转载,也请勿随意复制粘贴。

环境搭建

本例使用Jupyter Notebook进行具体实现,使用它需要安装Anaconda,这个部分参考博主之前的博文。 Ubuntu16.04使用Anaconda5搭建TensorFlow使用环境 图文详细教程 Jupyter Notebook即此前的Ipython Notebook,是一个web应用程序,可以以文档形式保存所有输入和输出。

鸢尾花数据集

鸢尾花数据集是一个经典的机器学习数据集,非常适合用来入门。它包括5列数据:前4列代表4个特征值即sepal length(萼片长度)、sepal width(萼片宽度)、petal length(花瓣长度)、petal width(花瓣宽度);最后一列为Species,即鸢尾花的种类,是我们训练目标,在机器学习中称作label。这种数据也被称作标记数据(labeled data)。

具体实现

代码语言:javascript复制
在tensorflow虚拟环境中启动jupyter notebook
steve@steve-Lenovo-V2000:~$ source activate tensorflow
(tensorflow) steve@steve-Lenovo-V2000:~$ jupyter notebook
代码语言:javascript复制
In[1]       
import tensorflow as tf
import numpy as np

print(tf.__version__)

1.3.0
代码语言:javascript复制
In[2]       
from tensorflow.contrib.learn.python.learn.datasets import base

#所用的数据集文件
IRIS_TRAINING = "iris_training.csv"
IRIS_TEST = "iris_test.csv"
#加载数据集
training_set = base.load_csv_with_header(filename = IRIS_TRAINING, 
                                         features_dtype = np.float32,
                                         target_dtype = np.int)
test_set = base.load_csv_with_header(filename = IRIS_TEST, 
                                         features_dtype = np.float32,
                                         target_dtype = np.int)

print(training_set.data)
print(training_set.target)

[[ 6.4000001   2.79999995  5.5999999   2.20000005]
 [ 5.          2.29999995  3.29999995  1.        ]
 [ 4.9000001   2.5         4.5         1.70000005]
 [ 4.9000001   3.0999999   1.5         0.1       ]
 [ 5.69999981  3.79999995  1.70000005  0.30000001]
 [ 4.4000001   3.20000005  1.29999995  0.2       ]
 [ 5.4000001   3.4000001   1.5         0.40000001]
 [ 6.9000001   3.0999999   5.0999999   2.29999995]
 [ 6.69999981  3.0999999   4.4000001   1.39999998]
 [ 5.0999999   3.70000005  1.5         0.40000001]
 [ 5.19999981  2.70000005  3.9000001   1.39999998]
 [ 6.9000001   3.0999999   4.9000001   1.5       ]
 [ 5.80000019  4.          1.20000005  0.2       ]
 [ 5.4000001   3.9000001   1.70000005  0.40000001]
 [ 7.69999981  3.79999995  6.69999981  2.20000005]
 [ 6.30000019  3.29999995  4.69999981  1.60000002]
 [ 6.80000019  3.20000005  5.9000001   2.29999995]
 [ 7.5999999   3.          6.5999999   2.0999999 ]
 [ 6.4000001   3.20000005  5.30000019  2.29999995]
 [ 5.69999981  4.4000001   1.5         0.40000001]
 [ 6.69999981  3.29999995  5.69999981  2.0999999 ]
 [ 6.4000001   2.79999995  5.5999999   2.0999999 ]
 [ 5.4000001   3.9000001   1.29999995  0.40000001]
 [ 6.0999999   2.5999999   5.5999999   1.39999998]
 [ 7.19999981  3.          5.80000019  1.60000002]
 [ 5.19999981  3.5         1.5         0.2       ]
 [ 5.80000019  2.5999999   4.          1.20000005]
 [ 5.9000001   3.          5.0999999   1.79999995]
 [ 5.4000001   3.          4.5         1.5       ]
 [ 6.69999981  3.          5.          1.70000005]
 [ 6.30000019  2.29999995  4.4000001   1.29999995]
 [ 5.0999999   2.5         3.          1.10000002]
 [ 6.4000001   3.20000005  4.5         1.5       ]
 [ 6.80000019  3.          5.5         2.0999999 ]
 [ 6.19999981  2.79999995  4.80000019  1.79999995]
 [ 6.9000001   3.20000005  5.69999981  2.29999995]
 [ 6.5         3.20000005  5.0999999   2.        ]
 [ 5.80000019  2.79999995  5.0999999   2.4000001 ]
 [ 5.0999999   3.79999995  1.5         0.30000001]
 [ 4.80000019  3.          1.39999998  0.30000001]
 [ 7.9000001   3.79999995  6.4000001   2.        ]
 [ 5.80000019  2.70000005  5.0999999   1.89999998]
 [ 6.69999981  3.          5.19999981  2.29999995]
 [ 5.0999999   3.79999995  1.89999998  0.40000001]
 [ 4.69999981  3.20000005  1.60000002  0.2       ]
 [ 6.          2.20000005  5.          1.5       ]
 [ 4.80000019  3.4000001   1.60000002  0.2       ]
 [ 7.69999981  2.5999999   6.9000001   2.29999995]
 [ 4.5999999   3.5999999   1.          0.2       ]
 [ 7.19999981  3.20000005  6.          1.79999995]
 [ 5.          3.29999995  1.39999998  0.2       ]
 [ 6.5999999   3.          4.4000001   1.39999998]
 [ 6.0999999   2.79999995  4.          1.29999995]
 [ 5.          3.20000005  1.20000005  0.2       ]
 [ 7.          3.20000005  4.69999981  1.39999998]
 [ 6.          3.          4.80000019  1.79999995]
 [ 7.4000001   2.79999995  6.0999999   1.89999998]
 [ 5.80000019  2.70000005  5.0999999   1.89999998]
 [ 6.19999981  3.4000001   5.4000001   2.29999995]
 [ 5.          2.          3.5         1.        ]
 [ 5.5999999   2.5         3.9000001   1.10000002]
 [ 6.69999981  3.0999999   5.5999999   2.4000001 ]
 [ 6.30000019  2.5         5.          1.89999998]
 [ 6.4000001   3.0999999   5.5         1.79999995]
 [ 6.19999981  2.20000005  4.5         1.5       ]
 [ 7.30000019  2.9000001   6.30000019  1.79999995]
 [ 4.4000001   3.          1.29999995  0.2       ]
 [ 7.19999981  3.5999999   6.0999999   2.5       ]
 [ 6.5         3.          5.5         1.79999995]
 [ 5.          3.4000001   1.5         0.2       ]
 [ 4.69999981  3.20000005  1.29999995  0.2       ]
 [ 6.5999999   2.9000001   4.5999999   1.29999995]
 [ 5.5         3.5         1.29999995  0.2       ]
 [ 7.69999981  3.          6.0999999   2.29999995]
 [ 6.0999999   3.          4.9000001   1.79999995]
 [ 4.9000001   3.0999999   1.5         0.1       ]
 [ 5.5         2.4000001   3.79999995  1.10000002]
 [ 5.69999981  2.9000001   4.19999981  1.29999995]
 [ 6.          2.9000001   4.5         1.5       ]
 [ 6.4000001   2.70000005  5.30000019  1.89999998]
 [ 5.4000001   3.70000005  1.5         0.2       ]
 [ 6.0999999   2.9000001   4.69999981  1.39999998]
 [ 6.5         2.79999995  4.5999999   1.5       ]
 [ 5.5999999   2.70000005  4.19999981  1.29999995]
 [ 6.30000019  3.4000001   5.5999999   2.4000001 ]
 [ 4.9000001   3.0999999   1.5         0.1       ]
 [ 6.80000019  2.79999995  4.80000019  1.39999998]
 [ 5.69999981  2.79999995  4.5         1.29999995]
 [ 6.          2.70000005  5.0999999   1.60000002]
 [ 5.          3.5         1.29999995  0.30000001]
 [ 6.5         3.          5.19999981  2.        ]
 [ 6.0999999   2.79999995  4.69999981  1.20000005]
 [ 5.0999999   3.5         1.39999998  0.30000001]
 [ 4.5999999   3.0999999   1.5         0.2       ]
 [ 6.5         3.          5.80000019  2.20000005]
 [ 4.5999999   3.4000001   1.39999998  0.30000001]
 [ 4.5999999   3.20000005  1.39999998  0.2       ]
 [ 7.69999981  2.79999995  6.69999981  2.        ]
 [ 5.9000001   3.20000005  4.80000019  1.79999995]
 [ 5.0999999   3.79999995  1.60000002  0.2       ]
 [ 4.9000001   3.          1.39999998  0.2       ]
 [ 4.9000001   2.4000001   3.29999995  1.        ]
 [ 4.5         2.29999995  1.29999995  0.30000001]
 [ 5.80000019  2.70000005  4.0999999   1.        ]
 [ 5.          3.4000001   1.60000002  0.40000001]
 [ 5.19999981  3.4000001   1.39999998  0.2       ]
 [ 5.30000019  3.70000005  1.5         0.2       ]
 [ 5.          3.5999999   1.39999998  0.2       ]
 [ 5.5999999   2.9000001   3.5999999   1.29999995]
 [ 4.80000019  3.0999999   1.60000002  0.2       ]
 [ 6.30000019  2.70000005  4.9000001   1.79999995]
 [ 5.69999981  2.79999995  4.0999999   1.29999995]
 [ 5.          3.          1.60000002  0.2       ]
 [ 6.30000019  3.29999995  6.          2.5       ]
 [ 5.          3.5         1.60000002  0.60000002]
 [ 5.5         2.5999999   4.4000001   1.20000005]
 [ 5.69999981  3.          4.19999981  1.20000005]
 [ 4.4000001   2.9000001   1.39999998  0.2       ]
 [ 4.80000019  3.          1.39999998  0.1       ]
 [ 5.5         2.4000001   3.70000005  1.        ]]
[2 1 2 0 0 0 0 2 1 0 1 1 0 0 2 1 2 2 2 0 2 2 0 2 2 0 1 2 1 1 1 1 1 2 2 2 2
2 0 0 2 2 2 0 0 2 0 2 0 2 0 1 1 0 1 2 2 2 2 1 1 2 2 2 1 2 0 2 2 0 0 1 0 2 2 0 1 1 1 2 0 1 1 1 2 0 1 1 1 0 2 1 0 0 2 0 0 2 1 0 0 1 0 1 0 0 0 0 1 0 2 1 0 2 0 1 1 0 0 1]
(第一个list是4个特征值,第二个list是目标结果,即鸢尾的种类,用int的0、1、2表示Iris Setosa(山鸢尾)、Iris Versicolour(变色鸢尾)和Iris Virginica(维吉尼亚鸢尾)。)
代码语言:javascript复制
In[3]   
    #构建模型
    #假定所有的特征都有一个实数值作为数据
    feature_name = "flower_features"
feature_columns = [tf.feature_column.numeric_column(feature_name, shape = [4])]
    classifier = tf.estimator.LinearClassifier(
                 feature_columns = feature_columns,
                     n_classes = 3,
                     model_dir = "/tmp/iris_model")

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/iris_model', '_tf_random_seed': 1, '_save_summary_steps': 100, '_save_checkpoints_secs': 600, '_save_checkpoints_steps': None, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100}
代码语言:javascript复制
In[4]   
# define input function 定义一个输入函数,用于为模型产生数据
def input_fn(dataset):
            def _fn():
                features = {feature_name: tf.constant(dataset.data)}
                label = tf.constant(dataset.target)
                return features, label
            return _fn
print(input_fn(training_set)())

({'flower_features': <tf.Tensor 'Const:0' shape=(120, 4) dtype=float32>}, <tf.Tensor 'Const_1:0' shape=(120,) dtype=int64>)
代码语言:javascript复制
In[5]   
# 数据流向
# raw data -> input_fn -> feature columns -> model
# fit model 训练模型
classifier.train(input_fn = input_fn(training_set), steps = 1000)
print('fit already done.')

INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into /tmp/iris_model/model.ckpt.
INFO:tensorflow:loss = 131.833, step = 1
INFO:tensorflow:global_step/sec: 1396.3
INFO:tensorflow:loss = 37.1391, step = 101 (0.072 sec)
INFO:tensorflow:global_step/sec: 1279.85
INFO:tensorflow:loss = 27.8594, step = 201 (0.078 sec)
INFO:tensorflow:global_step/sec: 1400.15
INFO:tensorflow:loss = 23.0449, step = 301 (0.071 sec)
INFO:tensorflow:global_step/sec: 1293.92
INFO:tensorflow:loss = 20.058, step = 401 (0.077 sec)
INFO:tensorflow:global_step/sec: 1610.43
INFO:tensorflow:loss = 18.0083, step = 501 (0.062 sec)
INFO:tensorflow:global_step/sec: 1617.19
INFO:tensorflow:loss = 16.505, step = 601 (0.062 sec)
INFO:tensorflow:global_step/sec: 1602.84
INFO:tensorflow:loss = 15.3496, step = 701 (0.062 sec)
INFO:tensorflow:global_step/sec: 1799.5
INFO:tensorflow:loss = 14.43, step = 801 (0.056 sec)
INFO:tensorflow:global_step/sec: 1577.18
INFO:tensorflow:loss = 13.6782, step = 901 (0.063 sec)
INFO:tensorflow:Saving checkpoints for 1000 into /tmp/iris_model/model.ckpt.
INFO:tensorflow:Loss for final step: 13.0562.
fit already done.
代码语言:javascript复制
In[6]   
# Evaluate accuracy 评估模型的准确度
accuracy_score = classifier.evaluate(input_fn = input_fn(test_set),
                                     steps = 100)["accuracy"]
print('nAccuracy: {0:f}'.format(accuracy_score))

INFO:tensorflow:Starting evaluation at 2018-03-03-12:07:04
INFO:tensorflow:Restoring parameters from /tmp/iris_model/model.ckpt-1000
INFO:tensorflow:Evaluation [1/100]
INFO:tensorflow:Evaluation [2/100]
INFO:tensorflow:Evaluation [3/100]
INFO:tensorflow:Evaluation [4/100]
INFO:tensorflow:Evaluation [5/100]
INFO:tensorflow:Evaluation [6/100]
INFO:tensorflow:Evaluation [7/100]
INFO:tensorflow:Evaluation [8/100]
……
INFO:tensorflow:Evaluation [98/100]
INFO:tensorflow:Evaluation [99/100]
INFO:tensorflow:Evaluation [100/100]
INFO:tensorflow:Finished evaluation at 2018-03-03-12:07:05
INFO:tensorflow:Saving dict for global step 1000: accuracy = 0.966667, average_loss = 0.120964, global_step = 1000, loss = 3.62893
Accuracy: 0.966667

总结与说明

本例主要使用了TensorFlow封装的高级API,即Estimator。Estimator已经对训练过程进行了封装,因此我们只需要配置就可以进行使用。

代码语言:javascript复制
    classifier = tf.estimator.LinearClassifier(
                 feature_columns = feature_columns,
                     n_classes = 3,
                     model_dir = "/tmp/iris_model")

这是构建模型所使用的代码,它定义了一个简单的线性模型,并配置了三个参数:feature_columns即特征值,已在前面定义;n_class即分类的总数,本例为3;model_dir即模型的存储路径。 本例所搭建的线性模型的最终准确度达到了96.66667%。这是一个不错的数值,因为这意味着从统计方面来说该模型能从100朵鸢尾中正确区分96朵鸢尾的品种。事实上,如果让一个真实的人来对100朵鸢尾做出品种的区分,他也有可能区分错其中4朵甚至更多。当然这并不意味着我们对此感到满足,因为这是一个示例的简单模型,我们应当追求实际应用模型的准确率超过99%!

0 人点赞