将 TFDS 数据集与 Keras Function API 结合使用
我正在尝试使用默认 TFDS 数据集之一训练使用 Keras 功能 API 制作的神经网络,但我不断收到与数据集相关的错误。
这个想法是建立一个用于对象检测的模型,但对于初稿,我试图只进行简单的图像分类(img,标签)。输入将为 (256x256x3) 图像。输入层如下:
img_inputs = keras.Input(shape=[256, 256, 3], name='image')
然后我尝试使用 TFDS 中提供的 voc2007 数据集(一个非常旧的轻型版本,以使其更快)
(train_ds, test_ds), ds_info = tfds.load(
'voc/2007',
split=['train', 'test'],
data_dir="/content/drive/My Drive",
with_info=True)
,然后按如下方式预处理数据:
def resize_and_normalize_img(example):
"""Normalizes images: `uint8` -> `float32`."""
example['image'] = tf.image.resize(example['image'], [256, 256])
example['image'] = tf.cast(example['image'], tf.float32) / 255.
return example
def reduce_for_classification(example):
for key in ['image/filename', 'labels_no_difficult', 'objects']:
example.pop(key)
return example
train_ds_class = train_ds.map(reduce_for_classification, num_parallel_calls=tf.data.AUTOTUNE)
train_ds_class = train_ds_class.map(resize_and_normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
train_ds_class = train_ds_class.cache()
train_ds_class = train_ds_class.shuffle(ds_info.splits['train'].num_examples)
train_ds_class = train_ds_class.batch(64)
train_ds_class = train_ds_class.prefetch(tf.data.AUTOTUNE)
test_ds_class = test_ds.map(reduce_for_classification, num_parallel_calls=tf.data.AUTOTUNE)
test_ds_class = test_ds_class.map(resize_and_normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
test_ds_class = test_ds_class.batch(64)
test_ds_class = test_ds_class.cache()
test_ds_class = test_ds_class.prefetch(tf.data.AUTOTUNE)
然后像这样拟合模型:
epochs=8
history = model.fit(
x=train_x, y =trian_y,
validation_data=test_ds_clas,
epochs=epochs
)
完成后这是当我收到一条错误消息时,说我的模型需要形状 [None, 256, 256, 3] 的输入,但它得到形状 [256, 256, 3] 的输入。
我认为这是一个与标签有关的问题。之前我遇到了从 tfds 获取的类似字典格式的数据中的额外键的问题,并尝试删除除标签之外的所有内容,但现在我仍然遇到这个问题,并且不知道如何继续。我觉得用 tfds 准备好数据集后,它应该准备好输入模型了,在查看文档、教程和堆栈溢出之后,我还没有找到答案,我希望遇到这个问题的人可以提供帮助。
更新: 为了提供更多信息,这是我正在使用的模型:
TLDR:图像输入 256x256x3,一系列卷积和残差块,以及以平均池化、全连接层和softmax 产生 (None, 1280) 张量。使用稀疏分类交叉熵作为损失,使用准确性作为度量。
img_inputs = keras.Input(shape=[256, 256, 3], name='image')
# first convolution
conv_first = tf.keras.layers.Conv2D(32, kernel_size=(3, 3), padding='same', name='first_conv')
x = conv_first(img_inputs)
# Second convolution
x = tf.keras.layers.Conv2D(64, kernel_size=(3, 3), strides=2, padding='same', name='second_conv')(x)
# First residual block
res = tf.keras.layers.Conv2D(32, kernel_size=(1, 1), name='res_block1_conv1')(x)
res = tf.keras.layers.Conv2D(64, kernel_size=(3, 3), padding='same', name='res_block1_conv2')(res)
x = x + res
# Convolution after First residual block
x = tf.keras.layers.Conv2D(128, kernel_size=3, strides=2, padding='same', name='first_post_res_conv')(x)
# Second residual Block
for i in range(2):
shortcut = x
res = tf.keras.layers.Conv2D(64, kernel_size=1, name=f'res_block2_conv1_loop{i}')(x)
res = tf.keras.layers.Conv2D(128, kernel_size=3, padding='same', name=f'res_block2_conv2_loop{i}')(res)
x = res + shortcut
# Convolution after Second residual block
x = tf.keras.layers.Conv2D(256, 3, strides=2, padding='same', name='second_post_res_conv')(x)
# Third residual Block
for i in range(8):
shortcut = x
res = tf.keras.layers.Conv2D(128, kernel_size=1, name=f'res_block3_conv1_loop{i}')(x)
res = tf.keras.layers.Conv2D(256, kernel_size=3, padding='same', name=f'res_block3_conv2_loop{i}')(res)
x = res + shortcut
# Convolution after Third residual block
x = tf.keras.layers.Conv2D(512, 3, strides=2, padding='same', name='third_post_res_conv')(x)
# Fourth residual Block
for i in range(8):
shortcut = x
res = tf.keras.layers.Conv2D(256, kernel_size=1, name=f'res_block4_conv1_loop{i}')(x)
res = tf.keras.layers.Conv2D(512, kernel_size=3, padding='same', name=f'res_block4_conv2_loop{i}')(res)
x = res + shortcut
# Convolution after Fourth residual block
x = tf.keras.layers.Conv2D(1024, 3, strides=2, padding='same', name='fourth_post_res_conv')(x)
# Fifth residual Block
for i in range(4):
shortcut = x
res = tf.keras.layers.Conv2D(512, kernel_size=1, name=f'res_block5_conv1_loop{i}')(x)
res = tf.keras.layers.Conv2D(1024, kernel_size=3, padding='same', name=f'res_block5_conv2_loop{i}')(res)
x = res + shortcut
# Global avg pooling
x = tf.keras.layers.GlobalAveragePooling2D(name='average_pooling')(x)
# Fully connected layer
x = tf.keras.layers.Dense(1280, name='fully_connected_layer')(x)
# Softmax
end_result = tf.keras.layers.Softmax(name='softmax')(x)
model = tf.keras.Model(inputs=img_inputs, outputs=end_result, name="darknet53")
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
在尝试 AloneTogether 提出的解决方案后,我收到以下错误(我尝试多次更改 tf.one_hot() 函数中的轴并得到相同的结果):
Node: 'sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits'
logits and labels must have the same first dimension, got logits shape [64,1280] and labels shape [1280]
[[{{node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]] [Op:__inference_train_function_20172]
这似乎与批处理有关,但不确切知道如何修复它。
整个问题似乎确实与标签编码有关,因为在没有 tf.reduce_sum() 函数的情况下运行该行时,我得到了相同的结果,但是:
First element had shape [2,20] and element 1 had shape [1,20].
如果我在没有 one-hot 编码行的情况下运行相同的代码,则会收到此错误:
´´´ 节点:'IteratorGetNext' 无法对组件 1 中具有不同形状的张量进行批处理。第一个元素具有形状 [4],元素 1 具有形状 [1]。 [[{{node IteratorGetNext}}]] [操作:__inference_train_function_18534] ´´´
I'm trying to train a neural network made with the Keras Functional API with one of the default TFDS Datasets, but I keep getting dataset related errors.
The idea is doing a model for object detection, but for the first draft I was trying to do just plain image classification (img, label). The input would be (256x256x3) images. The input layer is as follows:
img_inputs = keras.Input(shape=[256, 256, 3], name='image')
Then I'm trying to use the voc2007 dataset as available in TFDS (a very old and light version to make it faster)
(train_ds, test_ds), ds_info = tfds.load(
'voc/2007',
split=['train', 'test'],
data_dir="/content/drive/My Drive",
with_info=True)
then preprocessing the data as follows:
def resize_and_normalize_img(example):
"""Normalizes images: `uint8` -> `float32`."""
example['image'] = tf.image.resize(example['image'], [256, 256])
example['image'] = tf.cast(example['image'], tf.float32) / 255.
return example
def reduce_for_classification(example):
for key in ['image/filename', 'labels_no_difficult', 'objects']:
example.pop(key)
return example
train_ds_class = train_ds.map(reduce_for_classification, num_parallel_calls=tf.data.AUTOTUNE)
train_ds_class = train_ds_class.map(resize_and_normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
train_ds_class = train_ds_class.cache()
train_ds_class = train_ds_class.shuffle(ds_info.splits['train'].num_examples)
train_ds_class = train_ds_class.batch(64)
train_ds_class = train_ds_class.prefetch(tf.data.AUTOTUNE)
test_ds_class = test_ds.map(reduce_for_classification, num_parallel_calls=tf.data.AUTOTUNE)
test_ds_class = test_ds_class.map(resize_and_normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
test_ds_class = test_ds_class.batch(64)
test_ds_class = test_ds_class.cache()
test_ds_class = test_ds_class.prefetch(tf.data.AUTOTUNE)
And then fitting the model like:
epochs=8
history = model.fit(
x=train_x, y =trian_y,
validation_data=test_ds_clas,
epochs=epochs
)
And after doing this is when I get an error saying that my model expects an input of shape [None, 256, 256, 3] but it's getting an input of shape [256, 256, 3].
I think it's an issue to do with the label. Before I got problems with the extra keys from the dictionary-like format of the data you get from tfds and tried to remove everything except the label, but now I'm still getting this and don't know how to go forward. I feel like after getting the dataset prepared with tfds it should be ready to be fed to a model, and after looking through the documentation, tutorials and stack overflow I haven't found the answer, I hope someone who comes across this can help.
Update:
To give a bit more of information, this is the model I'm using:
TLDR: Image input 256x256x3, a succession of convolutions and residual blocks, and an ending with average pooling, fully connected layer, and softmax that results in a (None, 1280) tensor. Using sparse categorical cross-entropy as loss and accuracy as metric.
img_inputs = keras.Input(shape=[256, 256, 3], name='image')
# first convolution
conv_first = tf.keras.layers.Conv2D(32, kernel_size=(3, 3), padding='same', name='first_conv')
x = conv_first(img_inputs)
# Second convolution
x = tf.keras.layers.Conv2D(64, kernel_size=(3, 3), strides=2, padding='same', name='second_conv')(x)
# First residual block
res = tf.keras.layers.Conv2D(32, kernel_size=(1, 1), name='res_block1_conv1')(x)
res = tf.keras.layers.Conv2D(64, kernel_size=(3, 3), padding='same', name='res_block1_conv2')(res)
x = x + res
# Convolution after First residual block
x = tf.keras.layers.Conv2D(128, kernel_size=3, strides=2, padding='same', name='first_post_res_conv')(x)
# Second residual Block
for i in range(2):
shortcut = x
res = tf.keras.layers.Conv2D(64, kernel_size=1, name=f'res_block2_conv1_loop{i}')(x)
res = tf.keras.layers.Conv2D(128, kernel_size=3, padding='same', name=f'res_block2_conv2_loop{i}')(res)
x = res + shortcut
# Convolution after Second residual block
x = tf.keras.layers.Conv2D(256, 3, strides=2, padding='same', name='second_post_res_conv')(x)
# Third residual Block
for i in range(8):
shortcut = x
res = tf.keras.layers.Conv2D(128, kernel_size=1, name=f'res_block3_conv1_loop{i}')(x)
res = tf.keras.layers.Conv2D(256, kernel_size=3, padding='same', name=f'res_block3_conv2_loop{i}')(res)
x = res + shortcut
# Convolution after Third residual block
x = tf.keras.layers.Conv2D(512, 3, strides=2, padding='same', name='third_post_res_conv')(x)
# Fourth residual Block
for i in range(8):
shortcut = x
res = tf.keras.layers.Conv2D(256, kernel_size=1, name=f'res_block4_conv1_loop{i}')(x)
res = tf.keras.layers.Conv2D(512, kernel_size=3, padding='same', name=f'res_block4_conv2_loop{i}')(res)
x = res + shortcut
# Convolution after Fourth residual block
x = tf.keras.layers.Conv2D(1024, 3, strides=2, padding='same', name='fourth_post_res_conv')(x)
# Fifth residual Block
for i in range(4):
shortcut = x
res = tf.keras.layers.Conv2D(512, kernel_size=1, name=f'res_block5_conv1_loop{i}')(x)
res = tf.keras.layers.Conv2D(1024, kernel_size=3, padding='same', name=f'res_block5_conv2_loop{i}')(res)
x = res + shortcut
# Global avg pooling
x = tf.keras.layers.GlobalAveragePooling2D(name='average_pooling')(x)
# Fully connected layer
x = tf.keras.layers.Dense(1280, name='fully_connected_layer')(x)
# Softmax
end_result = tf.keras.layers.Softmax(name='softmax')(x)
model = tf.keras.Model(inputs=img_inputs, outputs=end_result, name="darknet53")
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
After trying the solution proposed by AloneTogether I'm getting the following errors (I tried changing the axis in the tf.one_hot() function many times and same result):
Node: 'sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits'
logits and labels must have the same first dimension, got logits shape [64,1280] and labels shape [1280]
[[{{node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]] [Op:__inference_train_function_20172]
Which seems to be related to the batching, but don't know exactly how to fix it.
The whole issue really seems related to the labels encoding, because when running that line without the tf.reduce_sum() function I get the same but with:
First element had shape [2,20] and element 1 had shape [1,20].
And if I run the same without the one-hot encoding line, I get this error:
´´´
Node: 'IteratorGetNext'
Cannot batch tensors with different shapes in component 1. First element had shape [4] and element 1 had shape [1].
[[{{node IteratorGetNext}}]] [Op:__inference_train_function_18534]
´´´
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为问题在于每个图像可以属于多个类,因此我建议对标签进行one-hot 编码。然后它应该可以工作。这是一个例子:
I think the problem is that each image can belong to multiple classes, so I would recommend one-hot encoding the labels. It should then work. Here is an example: