如何在 Perl 中删除所有带有非单词字符的标记?
我试图想出一个正则表达式来删除所有包含非单词字符的单词。
因此,如果它包含冒号、逗号、数字、括号等,则将其从行中删除,不仅仅是字符,还有单词。 到目前为止我有这个。
$wordline = s/\s.*\W.*?\s//g;
不一定是完美的,所以用破折号和撇号删除字符串就可以了。
I am trying to come up with a regex for removing all words that contain non-word characters.
So if it contains a colon, comma, number, bracket etc then remove it from the line, not just the character but the word. I have this so far.
$wordline = s/\s.*\W.*?\s//g;
Does not have to be perfect so removing strings with dash and apostrophe is ok.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在正则表达式领域,“单词字符”是字母、数字或下划线 (
[A-Za-z0-9_]
)。 听起来你用它来表示字母,所以\w
和\W
对你没有任何好处。 我的正则表达式匹配:一堆非空白字符:
\S+
不在前面:
(? 或后跟:
(?!\S)
后跟非空白字符除非所有字符均为字母:
(?![A-Za -z]+(?:\s|$))
这将留下删除的单词周围的所有空格。 正确处理这些问题比您想象的要棘手一些; 在单独的步骤中完成要容易得多,例如:
In regex-land, a "word character" is a letter, a digit, or an underscore (
[A-Za-z0-9_]
). It sounds like you're using it to mean just letters, so\w
and\W
won't do you any good. My regex matches:a bunch of non-whitespace characters:
\S+
not preceded:
(?<!\S)
or followed:(?!\S)
by non-whitespace charactersunless all the characters are letters:
(?![A-Za-z]+(?:\s|$))
This will leave behind all the spaces surrounding the words that it deletes. Dealing with those correctly is a little trickier than you might expect; it's much easier to do in a separate step, e.g.: