正则表达式在匹配字符串时如何忽略转义引号?
我正在尝试编写一个正则表达式,它将匹配除尚未转义的撇号之外的所有内容。 考虑以下事项:
<?php $s = 'Hi everyone, we\'re ready now.'; ?>
我的目标是编写一个基本上匹配其字符串部分的正则表达式。 我正在考虑一些事情,比如
/.*'([^']).*/
为了匹配一个简单的字符串,但我一直在试图弄清楚如何在撇号上进行负向后查找,以确保它前面没有反斜杠......
任何想法?
- JMT
I'm trying to write a regex that will match everything BUT an apostrophe that has not been escaped. Consider the following:
<?php $s = 'Hi everyone, we\'re ready now.'; ?>
My goal is to write a regular expression that will essentially match the string portion of that. I'm thinking of something such as
/.*'([^']).*/
in order to match a simple string, but I've been trying to figure out how to get a negative lookbehind working on that apostrophe to ensure that it is not preceded by a backslash...
Any ideas?
- JMT
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这是我的测试用例解决方案:
和我的(Perl,但我不使用任何我认为不特定的 Perl 功能)证明:
运行此显示:
请注意,开始时的初始通配符需要是非贪婪的。 然后我使用非回溯匹配来吞噬 \\ 和 \' 以及其他任何不是独立引号字符的内容。
我认为这可能模仿编译器的内置方法,这应该使它非常防弹。
Here's my solution with test cases:
And my (Perl, but I don't use any Perl-specific features I don't think) proof:
Running this shows:
Note that the initial wildcard at the start needs to be non-greedy. Then I use non-backtracking matches to gobble up \\ and \' and then anything else that is not a standalone quote character.
I think this one probably mimics the compiler's built-in approach, which should make it pretty bullet-proof.
印刷
prints
括号部分查找非撇号/反斜杠和反斜杠转义字符。 如果只能转义某些字符,请将
\\.
更改为\\['\\az]
或其他内容。The parenthesized portion looks for non-apostrophes/backslashes and backslash-escaped characters. If only certain characters can be escaped change the
\\.
to\\['\\a-z]
, or whatever.这是针对 JavaScript 的:
/('|")(?:\\\\|\\\1|[\s\S])*?\1/
it...
\n
、\t
等)您可以使用以下命令捕获 $2 中未加引号的字符串:
/('|")((?:\\\\|\\\1|[\s。 \S])*?)\1/
This is for JavaScript:
/('|")(?:\\\\|\\\1|[\s\S])*?\1/
it...
\n
,\t
, etc.)Only the first quote is captured. You can capture the unquoted string in $2 with:
/('|")((?:\\\\|\\\1|[\s\S])*?)\1/