如何在Python中将嵌套的LaTeX宏与re匹配?
我想正确匹配 LaTeX 宏,甚至是嵌套的宏。请参阅以下内容:
s = r'''
firstline
\lr{secondline\rl{ right-to-left
\lr{nested left-to-right} end RTL }
other text
}
\rl{ last \lr{end line
} end RTL }
'''
例如,在上面,我想要将 \lr
宏与其内容相匹配。我已经尝试了以下方法,但没有一个能正确工作:
re.findall(r'(?:\\lr\{.*\})', s, re.DOTALL)
['\\lr{secondline\\rl{ right-to-left\n \\lr{nested left-to-right} end RTL }\n other text\n}\n\\rl{ last \\lr{end line \n} end RTL }']
即使是非贪婪版本在这种情况下也不起作用:
re.findall(r'(?:\\lr\{.*?\})', s, re.DOTALL)
['\\lr{secondline\\rl{ right-to-left\n \\lr{nested left-to-right}',
'\\lr{end line \n}']
我需要一些正则表达式来正确匹配它,类似于嵌套括号,这里我为 LaTeX 宏嵌套了大括号。
编辑:
我想得到以下匹配:
['\\lr{secondline\\rl{ right-to-left\n \\lr{nested left-to-right} end RTL }\n other text\n}',
'\\lr{nested left-to-right}',
'\\lr{end line \n}']
如果我知道嵌套的级别,那就完美了,如下所示:
[('\\lr{secondline\\rl{ right-to-left\n \\lr{nested left-to-right} end RTL }\n other text\n}',1)
('\\lr{nested left-to-right}',2)
('\\lr{end line \n}',1)]
I wanted to match LaTeX macros correctly even the nested ones. See the following:
s = r'''
firstline
\lr{secondline\rl{ right-to-left
\lr{nested left-to-right} end RTL }
other text
}
\rl{ last \lr{end line
} end RTL }
'''
For instance, in the above, I want to match the \lr
macro with its content. I have tried the following but none of them worked correctly:
re.findall(r'(?:\\lr\{.*\})', s, re.DOTALL)
['\\lr{secondline\\rl{ right-to-left\n \\lr{nested left-to-right} end RTL }\n other text\n}\n\\rl{ last \\lr{end line \n} end RTL }']
even non-greedy version did not work in this case:
re.findall(r'(?:\\lr\{.*?\})', s, re.DOTALL)
['\\lr{secondline\\rl{ right-to-left\n \\lr{nested left-to-right}',
'\\lr{end line \n}']
I need some regular expression to match it correctly, similar to nested parentheses, here I have nested curly brackets for LaTeX macros.
edit:
I'd like to get the following matches:
['\\lr{secondline\\rl{ right-to-left\n \\lr{nested left-to-right} end RTL }\n other text\n}',
'\\lr{nested left-to-right}',
'\\lr{end line \n}']
It would be perfect if I knew about the level of nesting, something like the below:
[('\\lr{secondline\\rl{ right-to-left\n \\lr{nested left-to-right} end RTL }\n other text\n}',1)
('\\lr{nested left-to-right}',2)
('\\lr{end line \n}',1)]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用PYPI REGEX模块(使用
PIP安装REGEX
安装后)您可以使用python demo 和 Regex Demo 。
还要注意
重叠= true
选项REGEX.FINDITER
允许匹配嵌套出现。详细信息:
\\ lr
-\ lr
string(\ {(? )*})
- 第1组(定义为递归时被引用):\ {
- a{
char(?:[^{}] ++ |(?1))*
- 零或更多重复[^{}] ++
- 除{
和}
的一个或多个字符,而无需重新匹配文本同样,如果触发回溯(即它以人工匹配)|
- 或(?1)
- 第1组模式递归}
- a}
char。With PyPi regex module (after installing it with
pip install regex
) you can useSee the Python demo and the regex demo.
Note also the
overlapped=True
option used withregex.finditer
that allows matching nested occurrences.Details:
\\lr
-\lr
string(\{(?:[^{}]++|(?1))*})
- Group 1 (defined to be referred to while recursing):\{
- a{
char(?:[^{}]++|(?1))*
- zero or more repetitions of[^{}]++
- one or more chars other than{
and}
without the possibity to re-match the text again in case backtracking is triggered (i.e. it is matched possessively)|
- or(?1)
- Group 1 pattern recursed}
- a}
char.