过滤非字母数字“重复”人物
过滤非字母数字“重复”字符的最佳方法是什么
我宁愿不构建要检查的字符列表。有没有好的正则表达式可以在 PHP 中使用。
示例:
...........
*****************
!!!!!!!!
###########
------------------
~~~~~~~~~~~~~
特殊情况模式:
=*=*=*=*=*=
->->->->
What's the best way to filter non-alphanumeric "repeating" characters
I would rather no build a list of characters to check for. Is there good regex for this I can use in PHP.
Examples:
...........
*****************
!!!!!!!!
###########
------------------
~~~~~~~~~~~~~
Special case patterns:
=*=*=*=*=*=
->->->->
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
基于@sln 的回答:
Based on @sln answer:
模式可能是这样的:
s/([\W_]|=\*|->)\1+//g
或者,如果您只想替换为单个实例:
s/([\W_]|=\*|->)\1+/$1/g
编辑 ... 可能是任何特殊的序列应该是交替中的第一个,如果您需要使
==
之类的东西变得特殊,它不会被 [\W_] 抓住。所以类似
s/(==>|=\*|->|[\W_])\1+/$1/g
的特殊情况优先。The pattern could be something like this :
s/([\W_]|=\*|->)\1+//g
or, if you want to replace by just a single instance:
s/([\W_]|=\*|->)\1+/$1/g
edit ... probably any special sequence should be first in the alternation, incase you need to make something like
==
special, it won't be grabbed by [\W_].So something like
s/(==>|=\*|->|[\W_])\1+/$1/g
where special cases are first.sin 的解决方案相当不错,但使用
\W
“非单词”类包含空格。我认为您不想删除制表符或空格的序列!使用负类(例如:“[^A-Za-z0-9\s]
”)效果会更好。sin's solution is pretty good but the use of
\W
"non-word" class includes whitespace. I don't think you wan't to be removing sequences of tabs or spaces! Using a negative class (something like: '[^A-Za-z0-9\s]
') would work better.这将过滤掉所有符号
[代码]
$q = ereg_replace("[^A-Za-z0-9 ]", "", $q);
[/代码]
This will filter out all symbols
[code]
$q = ereg_replace("[^A-Za-z0-9 ]", "", $q);
[/code]
将删除非字母数字非空白字符串的重复模式。
但是,这是一个不好的做法,因为您还将删除 Unicode 库中的所有非 ASCII 欧洲和其他国际语言字符。
您真正不会关心国际化的唯一地方是在处理源代码时,但是您不会处理字符串中引用的文本,并且您也可能会意外地取消注释块。
您可能希望通过提供要替换的字符列表而不是包罗万象的方式来对尝试删除的内容进行更多限制。
编辑:我之前在尝试处理早期版本的 ShoutCAST 无线电名称时做过类似的事情。当时,电台试图通过使用令人讨厌的名称来引起人们的注意,例如:
<>
。我使用类似的编码来删除重复的符号,但后来(艰难的方式)学会了要小心我最终删除的内容。will remove repeated patterns of non-alphanumeric non-whitespace strings.
However, this is a bad practice because you'll also be removing all non-ASCII European and other international language characters in the Unicode base.
The only place where you really won't ever care about internationalization is in processing source code, but then you are not handling text quoted in strings and you may also accidentally de-comment a block.
You may want to be more restrictive in what you try to remove by giving a list of characters to replace instead of the catch-all.
Edit: I have done similar things before when trying to process early-version ShoutCAST radio names. At that time, stations tried to call attention to themselves by having obnoxious names like:
<<!!!!--- GREAT MUSIC STATION ---!!!!>>
. I used used similar coding to get rid of repeated symbols, but then learnt (the hard way) to be careful in what I eventually remove.这对我有用:
preg_replace('/(.)\1{3,}/i', '', $sourceStr);
它会删除所有连续重复 3 次以上的符号。
This works for me:
preg_replace('/(.)\1{3,}/i', '', $sourceStr);
It removes all the symbols that repats 3+ times in row.