掩盖除第一个字母以外的所有坏词
我正在尝试在 PHP 中创建一个坏词过滤器,它将搜索文本,与一组已知的坏词进行匹配,然后用星号替换坏词中的每个字符(第一个字母除外)。
示例:
fook
会变成f***
shoot
会变成s****
我唯一不知道的部分不知道如何保留字符串中的第一个字母,以及如何在保持相同字符串长度的同时用其他字母替换剩余的字母。
我的代码不合适,因为它总是用 3 个星号替换整个单词。
$string = preg_replace("/\b(". $word .")\b/i", "***", $string);
I'm attempting to create a bad word filter in PHP that will search a text, match against an array of known bad words, then replace each character (except the first letter) in the bad word with an asterisk.
Example:
fook
would becomef***
shoot
would becomes****
The only part I don't know is how to keep the first letter in the string, and how to replace the remaining letters with something else while keeping the same string length.
My code is unsuitable because it always replaces the whole word with exactly 3 asterisks.
$string = preg_replace("/\b(". $word .")\b/i", "***", $string);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这可以通过多种方式完成,使用非常奇怪的自动生成的正则表达式......
但我相信使用
preg_replace_callback()
最终会变得更加健壮This can be done in many ways, with very weird auto-generated regexps...
But I believe using
preg_replace_callback()
would end up being more robust假设要屏蔽的不良单词黑名单完全由字母或至少单词字符(允许数字和下划线)组成,则在内爆和插入之前,您不需要调用
preg_quote()
正则表达式模式。在匹配限定词的第一个字母后,使用
\G
元字符继续匹配。坏单词中每个后续匹配的字母都将被一对一地替换为星号。\K
用于忘记/释放坏词的第一个字母。这种方法无需调用
preg_replace_callback()
来测量每个匹配的字符串,也无需在文本块中每个匹配的错误单词的第一个字母后写入 N 个星号。细分:
代码:(演示)
Assuming your blacklist of bad words to be masked are fully comprised of letters or at least of word characters (allowing for digits and underscores), you won't need to call
preg_quote()
before imploding and inserting into the regex pattern.Use the
\G
metacharacter to continue matching after the first letter of a qualifying word is matched. Every subsequently matched letter in the bad word will be replaced 1-for-1 with an asterisk.\K
is used to forget/release the first letter of the bad word.This approach removes the need to call
preg_replace_callback()
to measure every matched string and write N asterisks after the first letter of every matches bad word in a block of text.Breakdown:
Code: (Demo)
这是 PHP 的 unicode 友好正则表达式。
正则表达式可以给你一个想法。
Here is unicode-friendly regular expression for PHP.
The regular expression can give you an idea.