ISSUE
从数据集(Dataset)中构造神经网络输入时,遇到同一维度的的Feature元素个数不同的问题,比如:
这种变长的Feature数据在Tensorflow中是不被支持的,当尝试将变长的list转换为Tensor时:
代码语言:javascript复制tf.convert_to_tensor(features)
会有如下的报错:
代码语言:javascript复制ValueError: Can't convert non-rectangular Python sequence to Tensor.
解决的方法就是把各个维度补齐,根据不同的目的补齐的方法不同,常见的就是补零或者重复最后一个元素。
补零(Padding With Zeros)
代码语言:javascript复制import numpy as np
import tensorflow as tf
x = np.array([[1,2,3],[4,5],[1,4,6,7]])
max_length = max(len(row) for row in x)
x_padded = np.array([row [0] * (max_length - len(row)) for row in x])
print(x_padded)
x_tensor = tf.convert_to_tensor(x_padded)
print(x_tensor)
输出:
代码语言:javascript复制[[1 2 3 0]
[4 5 0 0]
[1 4 6 7]]
tf.Tensor(
[[1 2 3 0]
[4 5 0 0]
[1 4 6 7]], shape=(3, 4), dtype=int64)
重复最后的元素(Repeat Last Element)
代码语言:javascript复制import tensorflow as tf
import numpy as np
x = np.array([[1,2,3],[4,5],[1,4,6,7]])
max_length = max(len(row) for row in x)
x_padded = np.array([row [row[-1]] * (max_length - len(row)) for row in x])
print(x_padded)
x_tensor = tf.convert_to_tensor(x_padded)
print(x_tensor)
输出:
代码语言:javascript复制[[1 2 3 3]
[4 5 5 5]
[1 4 6 7]]
tf.Tensor(
[[1 2 3 3]
[4 5 5 5]
[1 4 6 7]], shape=(3, 4), dtype=int64)