如何编写正则表达式来匹配字符串文字,其中转义是双引号字符?
我正在使用 ply 编写一个解析器,它需要识别 FORTRAN 字符串文字。这些用单引号引起来,转义字符是双单引号。即
'I don't明白你的意思'
是一个有效的转义 FORTRAN 字符串。
Ply 接受正则表达式的输入。到目前为止我的尝试没有成功,我不明白为什么。
t_STRING_LITERAL = r"'[^('')]*'"
有什么想法吗?
I am writing a parser using ply that needs to identify FORTRAN string literals. These are quoted with single quotes with the escape character being doubled single quotes. i.e.
'I don''t understand what you mean'
is a valid escaped FORTRAN string.
Ply takes input in regular expression. My attempt so far does not work and I don't understand why.
t_STRING_LITERAL = r"'[^('')]*'"
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
字符串文字是:
因此,我们的正则表达式是:
A string literal is:
Thus, our regex is:
您想要这样的内容:
这表示在单引号内可以有双引号或非引号字符。
括号定义一个字符类,您可以在其中列出可能匹配也可能不匹配的字符。它不允许任何比这更复杂的事情,因此尝试使用括号并匹配多字符序列
('')
不起作用。相反,您的[^('')]
字符类相当于[^'()]
,即它匹配除单引号或左括号或右括号之外的任何内容。You want something like this:
This says that inside of the single quotes you can have either double quotes or a non-quote character.
The brackets define a character class, in which you list the characters that may or may not match. It doesn't allow anything more complicated than that, so trying to use parentheses and match a multiple-character sequence
('')
doesn't work. Instead your[^('')]
character class is equivalent to[^'()]
, i.e. it matches anything that's not a single quote or a left or right parenthesis.通常很容易获得一些快速而肮脏的东西来解析给您带来问题的特定字符串文字,但对于通用解决方案,您可以从 pyparsing module:
我不确定 FORTRAN 的字符串文字和 Python 的字符串文字之间是否存在显着差异,但如果没有其他的话,它是一个方便的参考。
It's usually easy to get something quick-and-dirty for parsing particular string literals that are giving you problems, but for a general solution you can get a very powerful and complete regex for string literals from the pyparsing module:
I'm not sure about significant differences between FORTRAN's string literals and Python's, but it's a handy reference if nothing else.
结果
result