如何在 Perl 中删除所有带有非单词字符的标记？

发布于 2024-07-15 20:25:15 字数 195 浏览 7 评论 0原文

我试图想出一个正则表达式来删除所有包含非单词字符的单词。

因此，如果它包含冒号、逗号、数字、括号等，则将其从行中删除，不仅仅是字符，还有单词。到目前为止我有这个。

$wordline = s/\s.*\W.*?\s//g;

不一定是完美的，所以用破折号和撇号删除字符串就可以了。

原文

I am trying to come up with a regex for removing all words that contain non-word characters.

So if it contains a colon, comma, number, bracket etc then remove it from the line, not just the character but the word. I have this so far.

$wordline = s/\s.*\W.*?\s//g;

Does not have to be perfect so removing strings with dash and apostrophe is ok.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

友谊不毕业 2024-07-22 20:25:15

$wordline = join(" ", grep(/^\w+$/, split(/\s+/, $wordline)));

$wordline = join(" ", grep(/^\w+$/, split(/\s+/, $wordline)));

回复收藏 0 原文

我的痛♀有谁懂 2024-07-22 20:25:15

s/\w*([^\w\s]|\d)+\w* ?//g;

s/\w*([^\w\s]|\d)+\w* ?//g;

回复收藏 0 原文

此岸叶落 2024-07-22 20:25:15

s/(?<!\S)(?![A-Za-z]+(?:\s|$))\S+(?!\S)//g

在正则表达式领域，“单词字符”是字母、数字或下划线 ([A-Za-z0-9_])。听起来你用它来表示字母，所以 \w 和 \W 对你没有任何好处。我的正则表达式匹配：

一堆非空白字符：\S+
不在前面: (? 或后跟：(?!\S) 后跟非空白字符
除非所有字符均为字母：(?![A-Za -z]+(?:\s|$))

这将留下删除的单词周围的所有空格。正确处理这些问题比您想象的要棘手一些；在单独的步骤中完成要容易得多，例如：

s/^ +| +(?= |$)//g

s/(?<!\S)(?![A-Za-z]+(?:\s|$))\S+(?!\S)//g

In regex-land, a "word character" is a letter, a digit, or an underscore ([A-Za-z0-9_]). It sounds like you're using it to mean just letters, so \w and \W won't do you any good. My regex matches:

a bunch of non-whitespace characters: \S+
not preceded: (?<!\S) or followed: (?!\S) by non-whitespace characters
unless all the characters are letters: (?![A-Za-z]+(?:\s|$))

This will leave behind all the spaces surrounding the words that it deletes. Dealing with those correctly is a little trickier than you might expect; it's much easier to do in a separate step, e.g.: