如何编写正则表达式来匹配字符串文字,其中转义是双引号字符?

发布于 2024-08-19 15:07:48 字数 309 浏览 5 评论 0原文

我正在使用 ply 编写一个解析器,它需要识别 FORTRAN 字符串文字。这些用单引号引起来,转义字符是双单引号。即

'I don't明白你的意思'

是一个有效的转义 FORTRAN 字符串。

Ply 接受正则表达式的输入。到目前为止我的尝试没有成功,我不明白为什么。

t_STRING_LITERAL = r"'[^('')]*'"

有什么想法吗?

I am writing a parser using ply that needs to identify FORTRAN string literals. These are quoted with single quotes with the escape character being doubled single quotes. i.e.

'I don''t understand what you mean'

is a valid escaped FORTRAN string.

Ply takes input in regular expression. My attempt so far does not work and I don't understand why.

t_STRING_LITERAL = r"'[^('')]*'"

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

左耳近心 2024-08-26 15:07:48

字符串文字是:

  1. 一个左单引号,后跟:
  2. 任意数量的双单引号和非单引号,然后是
  3. 一个右单引号。

因此,我们的正则表达式是:

r"'(''|[^'])*'"

A string literal is:

  1. An open single-quote, followed by:
  2. Any number of doubled-single-quotes and non-single-quotes, then
  3. A close single quote.

Thus, our regex is:

r"'(''|[^'])*'"
梦归所梦 2024-08-26 15:07:48

您想要这样的内容:

r"'([^']|'')*'"

这表示在单引号内可以有双引号或非引号字符。

括号定义一个字符类,您可以在其中列出可能匹配也可能不匹配的字符。它不允许任何比这更复杂的事情,因此尝试使用括号并匹配多字符序列 ('') 不起作用。相反,您的 [^('')] 字符类相当于 [^'()],即它匹配除单引号或左括号或右括号之外的任何内容。

You want something like this:

r"'([^']|'')*'"

This says that inside of the single quotes you can have either double quotes or a non-quote character.

The brackets define a character class, in which you list the characters that may or may not match. It doesn't allow anything more complicated than that, so trying to use parentheses and match a multiple-character sequence ('') doesn't work. Instead your [^('')] character class is equivalent to [^'()], i.e. it matches anything that's not a single quote or a left or right parenthesis.

镜花水月 2024-08-26 15:07:48

通常很容易获得一些快速而肮脏的东西来解析给您带来问题的特定字符串文字,但对于通用解决方案,您可以从 pyparsing module

>>> import pyparsing
>>> pyparsing.quotedString.reString
'(?:"(?:[^"\\n\\r\\\\]|(?:"")|(?:\\\\x[0-9a-fA-F]+)|(?:\\\\.))*")|(?:\'(?:[^\'\\n\\r\\\\]|(?:\'\')|(?:\\\\x[0-9a-fA-F]+)|(?:\\\\.))*\')'

我不确定 FORTRAN 的字符串文字和 Python 的字符串文字之间是否存在显着差异,但如果没有其他的话,它是一个方便的参考。

It's usually easy to get something quick-and-dirty for parsing particular string literals that are giving you problems, but for a general solution you can get a very powerful and complete regex for string literals from the pyparsing module:

>>> import pyparsing
>>> pyparsing.quotedString.reString
'(?:"(?:[^"\\n\\r\\\\]|(?:"")|(?:\\\\x[0-9a-fA-F]+)|(?:\\\\.))*")|(?:\'(?:[^\'\\n\\r\\\\]|(?:\'\')|(?:\\\\x[0-9a-fA-F]+)|(?:\\\\.))*\')'

I'm not sure about significant differences between FORTRAN's string literals and Python's, but it's a handy reference if nothing else.

无需解释 2024-08-26 15:07:48
import re

ch ="'I don''t understand what you mean' and you' ?"

print re.search("'.*?'",ch).group()
print re.search("'.*?(?<!')'(?!')",ch).group()

结果

'I don'
'I don''t understand what you mean'
import re

ch ="'I don''t understand what you mean' and you' ?"

print re.search("'.*?'",ch).group()
print re.search("'.*?(?<!')'(?!')",ch).group()

result

'I don'
'I don''t understand what you mean'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文