problem introduction
代码语言:javascript复制sess.run([a,b]) # (1)同时运行a,b两个函数
sess.run(a)
sess.run(b) # (2)运行完a函数后再运行b函数
- 这两个语句初看时没有任何区别,但是如果 a,b 函数恰好是读取 example_batch 和 label_batch 这种需要使用到 数据批次输入输出函数时 例如(tf.train.shuffle_batch.tf.reader.read).
- (1)式只会调用一次输入数据函数,则得到的 example_batch 和 label_batch 来自同一批次。(2)式会单独调用两次输入数据函数,则得到的 example_batch 来自上一批次而 label_batch 来自下一批次。
- 这个需要十分注意,因为如果我们想要实时打印出 label_batch 和 inference(example_batch)时,即将输入数据的标签和经过模型预测推断的结果进行比较时.如果我们使用(2)中的写法,则 label_batch 和 inference(example_batch)并不是来自与同一批次数据。
example code
这里我们分别使用两种不同的代码,读取 csv 文件中的数据。我们观察这两种方式读取的数据有什么不同。源程序文件下载[1]test_tf_train_batch.csv[2]
代码语言:javascript复制import tensorflow as tf
BATCH_SIZE = 400
NUM_THREADS = 2
MAX_NUM = 5
def read_data(file_queue):
reader = tf.TextLineReader(skip_header_lines=1)
key, value = reader.read(file_queue)
defaults = [[0], [0.], [0.]]
NUM, C, Tensile = tf.decode_csv(value, defaults)
vertor_example = tf.stack([C])
vertor_label = tf.stack([Tensile])
vertor_num = tf.stack([NUM])
return vertor_example, vertor_label, vertor_num
def create_pipeline(filename, batch_size, num_threads):
file_queue = tf.train.string_input_producer([filename]) # 设置文件名队列
example, label, no = read_data(file_queue) # 读取数据和标签
example_batch, label_batch, no_batch = tf.train.batch(
[example, label, no], batch_size=batch_size, num_threads=num_threads, capacity=MAX_NUM)
return example_batch, label_batch, no_batch
x_train_batch, y_train_batch, no_train_batch = create_pipeline('test_tf_train_batch.csv', batch_size=BATCH_SIZE,
num_threads=NUM_THREADS)
init_op = tf.global_variables_initializer()
local_init_op = tf.local_variables_initializer()
with tf.Session() as sess:
sess.run(local_init_op)
sess.run(init_op)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
# 同时运行的方式
example, label, num = sess.run([x_train_batch, y_train_batch, no_train_batch])
print('The first mode to load data')
print('example', example)
print('label', label)
print('num', num)
# 分别运行的方式
# example = sess.run(x_train_batch)
# label = sess.run(y_train_batch)
# num = sess.run(no_train_batch)
# print('The second mode to load data')
# print('example', example)
# print('label', label)
# print('num', num)
coord.request_stop()
coord.join(threads)
Result
Run at the same time
代码语言:javascript复制example, label, num = sess.run([x_train_batch, y_train_batch, no_train_batch])
print('The first mode to load data')
print('example', example)
print('label', label)
print('num', num)
example | label | num |
---|---|---|
[ 0.294 ] | [ 0.59821427] | [1] |
[ 0.31 ] | [ 0.51785713] | [2] |
[ 0.2 ] | [ 0.79464287] | [3] |
[ 0.30000001] | [ 0.4732143 ] | [4] |
[ 0.36000001] | [ 0.6964286 ] | [5] |
Run respectively
代码语言:javascript复制 example = sess.run(x_train_batch)
label = sess.run(y_train_batch)
num = sess.run(no_train_batch)
print('The second mode to load data')
print('examplen', example)
print('labeln', label)
print('numn', num)
经过对比原始数据,我们发现采用单独运行的方式读取的 example 来自第一个 batch,label 来自下一个 batch,而 num 来自第三个 batch.也就是说其实我们单独运行了三次文件输入的程序。虽然是个小事,但是有些方面不注意,我们会酿成大错
example | label | num |
---|---|---|
[ 0.294 ] | [ 0.5625 ] | [11] |
[ 0.31 ] | [ 0.3482143 ] | [13] |
[ 0.2 ] | [ 0.5535714 ] | [12] |
[ 0.30000001] | [ 0.5714286 ] | [14] |
[ 0.36000001] | [ 0.48214287] | [15] |
- 原始数据
C | tensile | NUM |
---|---|---|
0.294 | 0.598214286 | 1 |
0.31 | 0.517857143 | 2 |
0.2 | 0.794642857 | 3 |
0.3 | 0.473214286 | 4 |
0.36 | 0.696428571 | 5 |
0.28 | 0.5625 | 6 |
0.2 | 0.348214286 | 7 |
0.284 | 0.553571429 | 8 |
0.38 | 0.482142857 | 9 |
0.44 | 0.571428571 | 10 |
0.214 | 0.660714286 | 11 |
0.72 | 0.589285714 | 12 |
0.38 | 0.616071429 | 13 |
0.266 | 0.5 | 14 |
0.46 | 0.642857143 | 15 |
参考资料
[1]
源程序文件下载: https://github.com/Asurada2015/Python-Data-Analysis-Learning-Notes/blob/master/TensorFlow/demo_00/test_tf_sessrun.py
[2]
test_tf_train_batch.csv: https://github.com/Asurada2015/Python-Data-Analysis-Learning-Notes/blob/master/TensorFlow/demo_00/test_tf_train_batch.csv