使用 sed 过滤表情符号

发布于 2024-09-02 13:30:21 字数 736 浏览 2 评论 0原文

我在 Win 上使用 cygwin grep 有一个 grep 表达式。

grep -a "\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u" all_fbs.txt > rockon_fbs.txt

然而,一旦我确定了表情符号类别,我想将它们从数据中删除。但是,上面 sed 中的相同正则表达式会导致语法错误(是的,我意识到我可以使用 /d 而不是 //g,但这没有什么区别,我仍然收到错误。)

sed "s/\(\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u\)*//g"

完整的行是:

grep -a "\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u" all_fbs.txt | sed "s/\(\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u\)*//g" | sed "s/^/ROCKON\t/" > rockon_fbs.txt

结果是:

sed: -e expression #1, char 14: unknown option to `s'

我知道它来自 sed regexp 我正在询问它 b/c 如果我删除整行的那部分,那么我不会收到错误(但是,当然,表情符号不会被过滤掉) 。

提前致谢,

史蒂夫

I have a grep expression using cygwin grep on Win.

grep -a "\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u" all_fbs.txt > rockon_fbs.txt

Once I identify the emoticon class, however, I want to strip them out of the data. However, the same regexp above within a sed results in a syntax error (yes, I realize I could use /d instead of //g, but this doesn't make a difference, I still get the error.)

sed "s/\(\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u\)*//g"

The full line is:

grep -a "\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u" all_fbs.txt | sed "s/\(\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u\)*//g" | sed "s/^/ROCKON\t/" > rockon_fbs.txt

The result is:

sed: -e expression #1, char 14: unknown option to `s'

I know it's coming from the sed regexp I'm asking about it b/c if I remove that portion of the full line, then I get no error (but, of course, the emoticons are not filtered out).

Thanks in advance,

Steve

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

粉红×色少女 2024-09-09 13:30:21

您需要转义 / 否则它将提前终止表达式。

s/\(\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u\)*//g
        ^     ^     ^      ^   ^
          These need escaping.

您还应该使用单引号字符串而不是双引号字符串,以防止 shell 解释反斜杠:

$ echo "\\,"
\,
$ echo '\\,'
\\,

因此请尝试以下操作:

$ echo 'foo \m/ bar \,,/ baz' | sed 's/\(\\,,\/\|\\m\/\|\\m\/\\>\.<\/\\m\/\|:u\)*//g'
foo  bar  baz

You need to escape / otherwise it will prematurely terminate the expression.

s/\(\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u\)*//g
        ^     ^     ^      ^   ^
          These need escaping.

You should also use single-quoted strings instead of double-quoted strings to prevent the backslashes being interpreted by the shell:

$ echo "\\,"
\,
$ echo '\\,'
\\,

So try this:

$ echo 'foo \m/ bar \,,/ baz' | sed 's/\(\\,,\/\|\\m\/\|\\m\/\\>\.<\/\\m\/\|:u\)*//g'
foo  bar  baz
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文