PHP正则表达式,替换所有垃圾符号
我无法理解一个可靠的正则表达式来做到这一点,对于所有正则表达式的魔力来说仍然很新。我取得了一些有限的成功,但我觉得有一种更简单、更有效的方法。
我想净化一串所有非字母数字字符,并将所有这些无效子集变成一个下划线,但在边缘修剪它们。例如,字符串 <<+ćThis?//String_..!
应转换为 This_String
对于在一个正则表达式中完成这一切有什么想法吗?我用常规的 str_replace 做到了这一点,然后将多下划线正则化,然后从边缘修剪掉最后一个下划线,但这似乎有点矫枉过正,就像 RegEx 可以一次性完成的事情一样。在这里追求最大速度/效率,即使我正在处理的是毫秒。
I can't get my head around a solid RegEx for doing this, still very new at all this RegEx magic. I had some limited success, but I feel like there is a simpler, more efficient way.
I would like to purify a string of all non-alphanumeric characters, and turn all those invalid subsets into one single underscore, but trim them at the edges. For example, the string <<+ćThis?//String_..!
should be converted to This_String
Any thoughts on doing this all in one RegEx? I did it with regular str_replace, and then regexed the multi-underscores out of the way, and then trimmed the last underscores from the edges, but it seems like overkill and like something RegEx could do in one go. Kind of going for max speed/efficiency here, even if it is milliseconds I'm dealing with.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这里的大写
\W
转义符匹配“非单词”字符,表示除字母和数字之外的所有字符。要删除剩余的外部下划线,我仍然会使用trim
。The uppercase
\W
escape here matches "non-word" characters, meaning everything but letters and numbers. To remove the leftover outer underscores I would still usetrim
.是的,你可以这样做:
然后你可以修剪前导和尾随下划线,也许可以这样做:
它不是一个正则表达式,但它比
str_replace
和一堆正则表达式更干净。Yes, you could do this:
Then you would trim leading and trailing underscores, maybe by doing this:
It's not one regex, but it's cleaner than
str_replace
and a bunch of regex.