Tensorflow 错误:无法序列化消息。对于多模态数据集
我正在尝试使用 Colab 上的 TPU 训练一个模型,该模型将采用两个 np.ndarray 输入,一个用于形状图像 (150, 150, 3),另一个用于形状的音频频谱图图像 (259, 128, 1)。现在我已经使用 NumPy 数组创建了我的数据集,如下所示:-
trainX = [train_image_array, train_spect_array]
trainY = labels_array
每个数据集的形状如下:-
train_image_array.shape = (86802, 150, 150, 3)
train_spect_array.shape = (86802, 259, 128, 1)
labels_array.shape = (86802,)
我也有一个类似的数据集用于测试,它有 9K 样本,而不是 86K 样本。 因此,当我尝试根据测试数据评估我的模型时,它会起作用,但是当我尝试根据训练数据训练或评估我的模型时,它会显示:-
<ipython-input-20-9240f9fc84df> in runModel(model, trainX, trainY, testX, testY, patience, resetWeights, checkpointPath, epochs, save_checkpoint, batch_size, generator, save_weights, save_weights_path, metrics)
76 model.evaluate(testX, testY, batch_size=batch_size)
77 # model.evaluate(trainX, trainY, batch_size=batch_size)
---> 78 history = model.fit(trainX, trainY, epochs=epochs, batch_size=batch_size, validation_data=(testX, testY), shuffle=True, callbacks=callbacks)
79 # model.evaluate(trainX, trainY, batch_size=batch_size)
80 model.evaluate(testX, testY, batch_size=batch_size)
/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
100 dtype = dtypes.as_dtype(dtype).as_datatype_enum
101 ctx.ensure_initialized()
--> 102 return ops.EagerTensor(value, ctx.device_name, dtype)
103
104
这里,runModel(...)
是我的函数仅由 model.evaluate 、 model.fit 、图形绘制等组成,主要问题在 model.fit(trainX, trainY )。 )。 model.evaluate(trainX, trainY, .. )
上也会出现同样的错误。我以为它可能只在 model.evaluate
上,所以我评论了它,但我错了
I am trying to train a model, using TPU on Colab, which will take two np.ndarray
inputs, one for an image of the shape, (150, 150, 3), and the other for an audio spectrogram image of the shape, (259, 128, 1). Now I have created my dataset using NumPy arrays as follows:-
trainX = [train_image_array, train_spect_array]
trainY = labels_array
here shape of each is as follows:-
train_image_array.shape = (86802, 150, 150, 3)
train_spect_array.shape = (86802, 259, 128, 1)
labels_array.shape = (86802,)
I also have a similar dataset for testing too, with instead 86K samples, it has 9K samples.
So, when I try to evaluate my model on testing data, it works, but when I try to train or evaluate my model on training data, it shows:-
<ipython-input-20-9240f9fc84df> in runModel(model, trainX, trainY, testX, testY, patience, resetWeights, checkpointPath, epochs, save_checkpoint, batch_size, generator, save_weights, save_weights_path, metrics)
76 model.evaluate(testX, testY, batch_size=batch_size)
77 # model.evaluate(trainX, trainY, batch_size=batch_size)
---> 78 history = model.fit(trainX, trainY, epochs=epochs, batch_size=batch_size, validation_data=(testX, testY), shuffle=True, callbacks=callbacks)
79 # model.evaluate(trainX, trainY, batch_size=batch_size)
80 model.evaluate(testX, testY, batch_size=batch_size)
/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
100 dtype = dtypes.as_dtype(dtype).as_datatype_enum
101 ctx.ensure_initialized()
--> 102 return ops.EagerTensor(value, ctx.device_name, dtype)
103
104
Here, runModel(...)
is my function, which just consists of model.evaluate
, model.fit
, plotting of graphs etc., main problem is at model.fit(trainX, trainY .. )
.
Same error arises on model.evaluate(trainX, trainY, .. )
. I thought it might be only on model.evaluate
, so I commented it, but I was wrong ????????♂️.
Can anyone help me?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我找到的针对此问题的唯一解决方案是,在数据集非常大的情况下,我们应该创建 .tfrecord 文件并使用 TensorFlow 数据集。此外,当使用 TPU 时,我们需要使用存储桶将 .tfrecord 文件保存在 Google Cloud Storage 中。
The only solution I found for this problem was that, in the case of very large datasets, we should create .tfrecord files and use the TensorFlow dataset with them. Also when using TPU with them, we will need to save our .tfrecord files in Google Cloud Storage, using a bucket.