Convert List With Non-fixed Length To Tensor

2022-04-28 19:33:56 浏览数 (1)

ISSUE

从数据集(Dataset)中构造神经网络输入时,遇到同一维度的的Feature元素个数不同的问题,比如:

text{features} = [[1, 2, 3], [4, 5], [1, 4, 6, 7]]

这种变长的Feature数据在Tensorflow中是不被支持的,当尝试将变长的list转换为Tensor时:

代码语言:javascript复制
tf.convert_to_tensor(features)

会有如下的报错:

代码语言:javascript复制
ValueError: Can't convert non-rectangular Python sequence to Tensor.

解决的方法就是把各个维度补齐,根据不同的目的补齐的方法不同,常见的就是补零或者重复最后一个元素

补零(Padding With Zeros)

代码语言:javascript复制
import numpy as np
import tensorflow as tf

x = np.array([[1,2,3],[4,5],[1,4,6,7]])
max_length = max(len(row) for row in x)
x_padded = np.array([row   [0] * (max_length - len(row)) for row in x])

print(x_padded)

x_tensor = tf.convert_to_tensor(x_padded)

print(x_tensor)

输出:

代码语言:javascript复制
[[1 2 3 0]
 [4 5 0 0]
 [1 4 6 7]]

tf.Tensor(
[[1 2 3 0]
 [4 5 0 0]
 [1 4 6 7]], shape=(3, 4), dtype=int64)

重复最后的元素(Repeat Last Element)

代码语言:javascript复制
import tensorflow as tf

import numpy as np

x = np.array([[1,2,3],[4,5],[1,4,6,7]])
max_length = max(len(row) for row in x)
x_padded = np.array([row   [row[-1]] * (max_length - len(row)) for row in x])

print(x_padded)

x_tensor = tf.convert_to_tensor(x_padded)

print(x_tensor)

输出:

代码语言:javascript复制
[[1 2 3 3]
 [4 5 5 5]
 [1 4 6 7]]

tf.Tensor(
[[1 2 3 3]
 [4 5 5 5]
 [1 4 6 7]], shape=(3, 4), dtype=int64)

0 人点赞