如何连接 +标记化 + TFX 预处理中填充字符串？

发布于 2025-01-12 11:33:17 字数 840 浏览 0 评论 0原文

我想在 TensorFlow Extended 管道的转换步骤/组件中执行常见的文本预处理步骤。我的数据如下（独立特征中的字符串，标签列中的0/1整数）：

field1 field2 field3 label
--------------------------
aa     bb     cc     0
ab     gfdg   ssdg   1

import tensorflow as tf 将tensorflow_text导入为tf_text fromtensorflow_text import UnicodeCharTokenizer

def preprocessing_fn(inputs):
    
    outputs = {}
    outputs['features_xf'] = tf.sparse.concat(axis=0, sp_inputs=[inputs["field1"], inputs["field2"], inputs["field3"]])
    outputs['label_xf'] = tf.convert_to_tensor(inputs["label"], dtype=tf.float32)

    return outputs

但这不起作用：（

ValueError: Arrays were not all the same length: 3 vs 1 [while running 'Transform[TransformIndex0]/ConvertToRecordBatch']

后来我也想对 MAX_LEN 应用字符级标记化和填充）。
有什么想法吗？

原文

I'd like to perform the usual text preprocessing steps in a TensorFlow Extended pipeline's Transform step/component. My data is the following (strings in independent features, 0/1 integers in label column):

field1 field2 field3 label
--------------------------
aa     bb     cc     0
ab     gfdg   ssdg   1

import tensorflow as tf
import tensorflow_text as tf_text
from tensorflow_text import UnicodeCharTokenizer

def preprocessing_fn(inputs):
    
    outputs = {}
    outputs['features_xf'] = tf.sparse.concat(axis=0, sp_inputs=[inputs["field1"], inputs["field2"], inputs["field3"]])
    outputs['label_xf'] = tf.convert_to_tensor(inputs["label"], dtype=tf.float32)

    return outputs

but this doesn't work:

ValueError: Arrays were not all the same length: 3 vs 1 [while running 'Transform[TransformIndex0]/ConvertToRecordBatch']

(Later on I want to apply char-level tokenization and padding to MAX_LEN as well).
Any idea?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据

关于作者

抹茶夏天i‖

暂无简介

文章

25 人气

关注发私信

佚名

文章 0 评论 0

关注

羁客

文章 0 评论 0

关注

天天爱笑的徐老师

文章 0 评论 0

关注

星

文章 0 评论 0

关注

夏日落

文章 0 评论 0

关注

隐诗

文章 0 评论 0

友情链接

文江博客

如何连接 +标记化 + TFX 预处理中填充字符串？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

佚名

羁客

天天爱笑的徐老师

星

夏日落

隐诗

友情链接

如何连接 +标记化 + TFX 预处理中填充字符串？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

关于作者

相关话题

热门标签

推荐作者

佚名

羁客

天天爱笑的徐老师

星

夏日落

隐诗

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。