过滤非字母数字“重复”人物

发布于 2024-10-21 02:26:46 字数 279 浏览 1 评论 0原文

过滤非字母数字“重复”字符的最佳方法是什么

我宁愿不构建要检查的字符列表。有没有好的正则表达式可以在 PHP 中使用。

示例：

...........

*****************

!!!!!!!! 

########### 

------------------

~~~~~~~~~~~~~

特殊情况模式：

=*=*=*=*=*=

->->->->

原文

What's the best way to filter non-alphanumeric "repeating" characters

I would rather no build a list of characters to check for. Is there good regex for this I can use in PHP.

Examples:

...........

*****************

!!!!!!!! 

########### 

------------------

~~~~~~~~~~~~~

Special case patterns:

=*=*=*=*=*=

->->->->

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

深爱不及久伴 2024-10-28 02:26:46

基于@sln 的回答：

$str = preg_replace('~([^0-9a-zA-Z])\1+|(?:=[*])+|(?:->)+~', '', $str);

Based on @sln answer:

$str = preg_replace('~([^0-9a-zA-Z])\1+|(?:=[*])+|(?:->)+~', '', $str);

回复收藏 0 原文

去了角落 2024-10-28 02:26:46

模式可能是这样的：s/([\W_]|=\*|->)\1+//g
或者，如果您只想替换为单个实例： s/([\W_]|=\*|->)\1+/$1/g

编辑 ... 可能是任何特殊的序列应该是交替中的第一个，如果您需要使 == 之类的东西变得特殊，它不会被 [\W_] 抓住。

所以类似 s/(==>|=\*|->|[\W_])\1+/$1/g 的特殊情况优先。

回复收藏 0 原文

染火枫林 2024-10-28 02:26:46

preg_replace('~\W+~', '', $str);

preg_replace('~\W+~', '', $str);

回复收藏 0 原文

猥琐帝 2024-10-28 02:26:46

sin 的解决方案相当不错，但使用 \W “非单词”类包含空格。我认为您不想删除制表符或空格的序列！使用负类（例如：“[^A-Za-z0-9\s]”）效果会更好。

回复收藏 0 原文

爱给你人给你 2024-10-28 02:26:46

这将过滤掉所有符号

[代码]
$q = ereg_replace("[^A-Za-z0-9 ]", "", $q);
[/代码]

回复收藏 0 原文

凡间太子 2024-10-28 02:26:46

replace(/([^A-Za-z0-9\s]+)\1+/, "")

将删除非字母数字非空白字符串的重复模式。

但是，这是一个不好的做法，因为您还将删除 Unicode 库中的所有非 ASCII 欧洲和其他国际语言字符。

您真正不会关心国际化的唯一地方是在处理源代码时，但是您不会处理字符串中引用的文本，并且您也可能会意外地取消注释块。

您可能希望通过提供要替换的字符列表而不是包罗万象的方式来对尝试删除的内容进行更多限制。

编辑：我之前在尝试处理早期版本的 ShoutCAST 无线电名称时做过类似的事情。当时，电台试图通过使用令人讨厌的名称来引起人们的注意，例如：<>。我使用类似的编码来删除重复的符号，但后来（艰难的方式）学会了要小心我最终删除的内容。

replace(/([^A-Za-z0-9\s]+)\1+/, "")

will remove repeated patterns of non-alphanumeric non-whitespace strings.

However, this is a bad practice because you'll also be removing all non-ASCII European and other international language characters in the Unicode base.

The only place where you really won't ever care about internationalization is in processing source code, but then you are not handling text quoted in strings and you may also accidentally de-comment a block.

You may want to be more restrictive in what you try to remove by giving a list of characters to replace instead of the catch-all.

Edit: I have done similar things before when trying to process early-version ShoutCAST radio names. At that time, stations tried to call attention to themselves by having obnoxious names like: <<!!!!--- GREAT MUSIC STATION ---!!!!>>. I used used similar coding to get rid of repeated symbols, but then learnt (the hard way) to be careful in what I eventually remove.

回复收藏 0 原文