PHP preg_match 使用正则表达式 - 字过滤器
大家好,
我正在尝试使用preg_match来识别是否在文本字符串中找到单个单词。如果单词中每个字符有多个实例(按正确的顺序),则需要选取该单词。为了让自己的生活变得困难,即使客户试图通过在我希望匹配的单词中输入某些字符来“欺骗” preg_match,我也想拾取这个单词。
它用于脏话过滤器,如果找到“dave”,我会将其替换为其他内容。我试图想出完美的正则表达式,但运气不佳。请参阅以下示例以及我迄今为止发现的问题(我使用 3 作为客户端可以用来“欺骗”支票的示例字符);
使用:~\b(?:3+)?d+(?:3+)?a+(?:3+)?v+(?:3+)?e+(?:3+)?\b~ i
好的
- 输入:dave = 通过
- 输入:3d3a3v3e3 = 通过
- 输入:ddddaaaavvvveeee = 通过
- 输入:3ave = 失败
否好的
- 输入:dd3ddaa3aa3vv3vvee3ee =失败(我希望通过)
使用:~\b[d3]+[a3]+[v3] +[e3]+\b~i
好的
- 输入:dave = 通过
- 输入:3d3a3v3e3 = 通过
- 输入: ddddaaaavvvveeee = pass
- 输入:dd3ddaa3aa3vv3vvee3ee = pass
不好
- 输入:3ave = pass (我希望此操作失败)
感谢您提供的任何帮助正则表达式,非常感谢。
Hello All,
I am trying to use preg_match to identify if a single word found within a string of text. This word needs to be picked up if there are multiple instances of each character within the word (in the correct order). To make life hard for myself I also want to pick up on the word even if the client has tried to 'fool' the preg_match by means of entering certain characters within the word I wish to match.
It is for use in a swearword filter, if 'dave' is found I will replace it with something else. I have tried to come up with the perfect regular expression but I'm not having much luck. Please see the following examples and the issues I have found so far (I have used 3 as an example character the client could use to 'fool' the check);
Using: ~\b(?:3+)?d+(?:3+)?a+(?:3+)?v+(?:3+)?e+(?:3+)?\b~i
Okay
- Input: dave = pass
- Input: 3d3a3v3e3 = pass
- Input: ddddaaaavvvveeee = pass
- Input: 3ave = fail
Not Okay
- Input: dd3ddaa3aa3vv3vvee3ee = fail (I want this to pass)
Using: ~\b[d3]+[a3]+[v3]+[e3]+\b~i
Okay
- Input: dave = pass
- Input: 3d3a3v3e3 = pass
- Input: ddddaaaavvvveeee = pass
- Input: dd3ddaa3aa3vv3vvee3ee = pass
Not Okay
- Input: 3ave = pass (I want this to fail)
Thank you for any help on the regular expression, it's much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
无需讨论它是否是一个好的亵渎过滤器(可能不是!),以下正则表达式将满足您的规范:
如果“3”是唯一的“特殊”字符,则尝试以下操作:
Without discussing if it's a good profanity filter (probably not!), the following regex will fulfill your spec:
If '3' is the only 'special' character, then try this:
这行不通。
例如,您的过滤器将阻止“firetruck”;)
有人也可以用
u
替换v
或用c
替换<
除了拥有大量已知单词及其拼写错误的白名单之外,我不知道是否有建立脏话过滤器的好方法。
也许您应该重新考虑为什么需要脏话过滤器。如果您的“客户”想要它,请让他们提供他们想要阻止的单词列表,这不是您的问题。
This wont work.
For instance, your filter is going to block "firetruck" ;)
Someone could also just substitute a
u
for av
or ac
for a<
I don't know if there is a good way to build a profanity filter, other than to have a large white-list of known words and their misspellings.
Perhaps you should rethink why you want the profanity filter. If your 'customer' wants it, have them supply a list of words they want blocked, it's not your problem.