如何在tf.dataset上调整文本向量层

发布于 2025-02-08 19:47:32 字数 1000 浏览 2 评论 0原文

我加载了这样的数据集：

self.train_ds = tf.data.experimental.make_csv_dataset(
            self.config["input_paths"]["data"]["train"],
            batch_size=self.params["batch_size"],
            shuffle=False,
            label_name="tags",
            num_epochs=1,
        )

我的文本向量化层看起来像这样：

vectorizer = tf.keras.layers.TextVectorization(
            standardize=code_standaridization,
            split="whitespace",
            output_mode="int",
            output_sequence_length=params["input_dim"],
            max_tokens=100_000,
        )

我认为这已经足够了：

vectorizer.adapt(data_provider.train_ds)

但是不是，我有一个错误：

TypeError: Expected string, but got Tensor("IteratorGetNext:0", shape=(None, None), dtype=string) of type 'Tensor'.

我可以以某种方式在tensorflow数据集上调整我的矢量化器吗？

原文

I load my dataset like this:

self.train_ds = tf.data.experimental.make_csv_dataset(
            self.config["input_paths"]["data"]["train"],
            batch_size=self.params["batch_size"],
            shuffle=False,
            label_name="tags",
            num_epochs=1,
        )

My TextVectorization layer looks like this:

vectorizer = tf.keras.layers.TextVectorization(
            standardize=code_standaridization,
            split="whitespace",
            output_mode="int",
            output_sequence_length=params["input_dim"],
            max_tokens=100_000,
        )

And I thought this is going to be enough:

vectorizer.adapt(data_provider.train_ds)

But its not, I have this error:

TypeError: Expected string, but got Tensor("IteratorGetNext:0", shape=(None, None), dtype=string) of type 'Tensor'.

Can I somehow adapt my vectorizer on TensorFlow dataset?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

哆啦不做梦 2025-02-15 19:47:32

很可能问题是，您在尝试适应时，您在batch_size中使用Train_ds_ds。

您必须执行：

vectorizer.adapt(train_ds.unbatch().map(lambda x, y: x).batch(BATCH_SIZE))

.unbatch（）求解您当前看到的错误，并且需要.map（），因为文本向量层在字符串上运行，因此您需要从数据集中获取它们

Most probably the issue is that you use batch_size in your train_ds without .unbatch() when you try to adapt.

You have to do:

vectorizer.adapt(train_ds.unbatch().map(lambda x, y: x).batch(BATCH_SIZE))

The .unbatch() solves the error that you are currently seeing and the .map() is needed because the TextVectorization layer operates on batches of strings so you need to get them from your dataset

回复收藏 0 原文

~没有更多了~