如何在tf.dataset上调整文本向量层
我加载了这样的数据集:
self.train_ds = tf.data.experimental.make_csv_dataset(
self.config["input_paths"]["data"]["train"],
batch_size=self.params["batch_size"],
shuffle=False,
label_name="tags",
num_epochs=1,
)
我的文本向量化层看起来像这样:
vectorizer = tf.keras.layers.TextVectorization(
standardize=code_standaridization,
split="whitespace",
output_mode="int",
output_sequence_length=params["input_dim"],
max_tokens=100_000,
)
我认为这已经足够了:
vectorizer.adapt(data_provider.train_ds)
但是不是,我有一个错误:
TypeError: Expected string, but got Tensor("IteratorGetNext:0", shape=(None, None), dtype=string) of type 'Tensor'.
我可以以某种方式在tensorflow数据集上调整我的矢量化器吗?
I load my dataset like this:
self.train_ds = tf.data.experimental.make_csv_dataset(
self.config["input_paths"]["data"]["train"],
batch_size=self.params["batch_size"],
shuffle=False,
label_name="tags",
num_epochs=1,
)
My TextVectorization layer looks like this:
vectorizer = tf.keras.layers.TextVectorization(
standardize=code_standaridization,
split="whitespace",
output_mode="int",
output_sequence_length=params["input_dim"],
max_tokens=100_000,
)
And I thought this is going to be enough:
vectorizer.adapt(data_provider.train_ds)
But its not, I have this error:
TypeError: Expected string, but got Tensor("IteratorGetNext:0", shape=(None, None), dtype=string) of type 'Tensor'.
Can I somehow adapt my vectorizer on TensorFlow dataset?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
很可能问题是,您在尝试适应时,您在
batch_size
中使用Train_ds_ds
。您必须执行:
.unbatch()
求解您当前看到的错误,并且需要.map()
,因为文本向量层在字符串上运行,因此您需要从数据集中获取它们Most probably the issue is that you use
batch_size
in yourtrain_ds
without.unbatch()
when you try to adapt.You have to do:
The
.unbatch()
solves the error that you are currently seeing and the.map()
is needed because the TextVectorization layer operates on batches of strings so you need to get them from your dataset