当前位置：文江博客话题详情

Python TensorFlow Keras data-preprocessing

矢量器标准化中无效的模式

发布于 2025-02-09 11:10:28 字数 1498 浏览 1 评论 0原文

我正在尝试将独特的字符与单词分开，以便我可以将它们视为单独的令牌。

这是我的矢量器：

vectorizer = tf.keras.layers.TextVectorization(
                standardize=code_standaridization,
                split="whitespace",
                output_mode="int",
                output_sequence_length=params["input_dim"],
            )

这是我的功能：

def code_standaridization(input_data):
    """
    Helps with finding correct embedding.
    """
    input_data = tf.strings.regex_replace(input_data, "-", " - ")
    input_data = tf.strings.regex_replace(input_data, "_", " _ ")
    input_data = tf.strings.regex_replace(input_data, "(", " ( ")
    input_data = tf.strings.regex_replace(input_data, ")", " ) ")
    input_data = tf.strings.regex_replace(input_data, "{", " { ")
    input_data = tf.strings.regex_replace(input_data, "}", " } ")
    input_data = tf.strings.regex_replace(input_data, "[", " [ ")
    input_data = tf.strings.regex_replace(input_data, "]", " ] ")
    input_data = tf.strings.regex_replace(input_data, '"', ' " ')
    input_data = tf.strings.regex_replace(input_data, "'", " ' ")
    input_data = tf.strings.regex_replace(input_data, ".", " . ")
    input_data = tf.strings.regex_replace(input_data, ",", " , ")

    return input_data

但是我遇到了这个错误：

Node: 'StaticRegexReplace_2'
Invalid pattern: (, error: missing ): (
     [[{{node StaticRegexReplace_2}}]] [Op:__inference_adapt_step_133]

I am trying to separate unique characters from words so I can threat them as separate tokens.

This is my vectorizer:

vectorizer = tf.keras.layers.TextVectorization(
                standardize=code_standaridization,
                split="whitespace",
                output_mode="int",
                output_sequence_length=params["input_dim"],
            )

And this is my function:

def code_standaridization(input_data):
    """
    Helps with finding correct embedding.
    """
    input_data = tf.strings.regex_replace(input_data, "-", " - ")
    input_data = tf.strings.regex_replace(input_data, "_", " _ ")
    input_data = tf.strings.regex_replace(input_data, "(", " ( ")
    input_data = tf.strings.regex_replace(input_data, ")", " ) ")
    input_data = tf.strings.regex_replace(input_data, "{", " { ")
    input_data = tf.strings.regex_replace(input_data, "}", " } ")
    input_data = tf.strings.regex_replace(input_data, "[", " [ ")
    input_data = tf.strings.regex_replace(input_data, "]", " ] ")
    input_data = tf.strings.regex_replace(input_data, '"', ' " ')
    input_data = tf.strings.regex_replace(input_data, "'", " ' ")
    input_data = tf.strings.regex_replace(input_data, ".", " . ")
    input_data = tf.strings.regex_replace(input_data, ",", " , ")

    return input_data

But I am getting this error:

Node: 'StaticRegexReplace_2'
Invalid pattern: (, error: missing ): (
     [[{{node StaticRegexReplace_2}}]] [Op:__inference_adapt_step_133]

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（1）

终止放荡 2025-02-16 11:10:29

regex_replace（Regex_replace） 定义在这里，哪个注释：

最简单的正则表达式是单个字面字符。除了*+？（）|之类的metacharacter，字符匹配自己。要匹配Metacharacter，请用后斜线逃脱它：\+匹配字面加上字符。

因此，您应该添加后斜切以更换括号：

input_data = tf.strings.regex_replace(input_data, "\(", " \( ")
input_data = tf.strings.regex_replace(input_data, "\)", " \) ")

regex_replace() uses the RE2 syntax defined here, which notes:

The simplest regular expression is a single literal character. Except for the metacharacters like *+?()|, characters match themselves. To match a metacharacter, escape it with a backslash: \+ matches a literal plus character.

So, you should add backslashes for replacing parentheses:

input_data = tf.strings.regex_replace(input_data, "\(", " \( ")
input_data = tf.strings.regex_replace(input_data, "\)", " \) ")

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

文章

评论

26 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

alipaysp_snBf0MSZIv

文章 0 评论 0

梦断已成空

文章 0 评论 0

瞎闹

文章 0 评论 0

凯凯我们等你回来

文章 0 评论 0

寄意

文章 0 评论 0

似梦非梦

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文