如何在Python中将嵌套的LaTeX宏与re匹配?

发布于 2025-01-19 14:34:22 字数 1192 浏览 2 评论 0原文

我想正确匹配 LaTeX 宏,甚至是嵌套的宏。请参阅以下内容:

s = r'''
firstline
\lr{secondline\rl{ right-to-left
        \lr{nested left-to-right} end RTL }
        other text
}
\rl{ last \lr{end line 
} end RTL }
'''

例如,在上面,我想要将 \lr 宏与其内容相匹配。我已经尝试了以下方法,但没有一个能正确工作:

re.findall(r'(?:\\lr\{.*\})', s, re.DOTALL)
['\\lr{secondline\\rl{ right-to-left\n        \\lr{nested left-to-right} end RTL }\n        other text\n}\n\\rl{ last \\lr{end line \n} end RTL }']

即使是非贪婪版本在这种情况下也不起作用:

re.findall(r'(?:\\lr\{.*?\})', s, re.DOTALL)
['\\lr{secondline\\rl{ right-to-left\n        \\lr{nested left-to-right}',
 '\\lr{end line \n}']

我需要一些正则表达式来正确匹配它,类似于嵌套括号,这里我为 LaTeX 宏嵌套了大括号。

编辑:

我想得到以下匹配:

['\\lr{secondline\\rl{ right-to-left\n        \\lr{nested left-to-right} end RTL }\n        other text\n}', 
'\\lr{nested left-to-right}',
'\\lr{end line \n}']

如果我知道嵌套的级别,那就完美了,如下所示:

[('\\lr{secondline\\rl{ right-to-left\n        \\lr{nested left-to-right} end RTL }\n        other text\n}',1) 
('\\lr{nested left-to-right}',2)
('\\lr{end line \n}',1)]

I wanted to match LaTeX macros correctly even the nested ones. See the following:

s = r'''
firstline
\lr{secondline\rl{ right-to-left
        \lr{nested left-to-right} end RTL }
        other text
}
\rl{ last \lr{end line 
} end RTL }
'''

For instance, in the above, I want to match the \lr macro with its content. I have tried the following but none of them worked correctly:

re.findall(r'(?:\\lr\{.*\})', s, re.DOTALL)
['\\lr{secondline\\rl{ right-to-left\n        \\lr{nested left-to-right} end RTL }\n        other text\n}\n\\rl{ last \\lr{end line \n} end RTL }']

even non-greedy version did not work in this case:

re.findall(r'(?:\\lr\{.*?\})', s, re.DOTALL)
['\\lr{secondline\\rl{ right-to-left\n        \\lr{nested left-to-right}',
 '\\lr{end line \n}']

I need some regular expression to match it correctly, similar to nested parentheses, here I have nested curly brackets for LaTeX macros.

edit:

I'd like to get the following matches:

['\\lr{secondline\\rl{ right-to-left\n        \\lr{nested left-to-right} end RTL }\n        other text\n}', 
'\\lr{nested left-to-right}',
'\\lr{end line \n}']

It would be perfect if I knew about the level of nesting, something like the below:

[('\\lr{secondline\\rl{ right-to-left\n        \\lr{nested left-to-right} end RTL }\n        other text\n}',1) 
('\\lr{nested left-to-right}',2)
('\\lr{end line \n}',1)]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

压抑⊿情绪 2025-01-26 14:34:23

使用PYPI REGEX模块(使用PIP安装REGEX安装后)您可以使用

import regex

s = r'''
firstline
\lr{secondline\rl{ right-to-left
        \lr{nested left-to-right} end RTL }
        other text
}
\rl{ last \lr{end line 
} end RTL }
'''

print( [x.group() for x in regex.finditer(r'\\lr(\{(?:[^{}]++|(?1))*})', s, overlapped=True)] )
# => ['\\lr{secondline\\rl{ right-to-left\n        \\lr{nested left-to-right} end RTL }\n        other text\n}', '\\lr{nested left-to-right}', '\\lr{end line \n}']

python demo Regex Demo

还要注意重叠= true选项REGEX.FINDITER允许匹配嵌套出现。

详细信息

  • \\ lr - \ lr string
  • (\ {(? )*}) - 第1组(定义为递归时被引用):
    • \ { - a { char
    • (?:[^{}] ++ |(?1))* - 零或更多重复
    • [^{}] ++ - 除{}的一个或多个字符,而无需重新匹配文本同样,如果触发回溯(即它以人工匹配)
    • | - 或
    • (?1) - 第1组模式递归
    • } - a } char。

With PyPi regex module (after installing it with pip install regex) you can use

import regex

s = r'''
firstline
\lr{secondline\rl{ right-to-left
        \lr{nested left-to-right} end RTL }
        other text
}
\rl{ last \lr{end line 
} end RTL }
'''

print( [x.group() for x in regex.finditer(r'\\lr(\{(?:[^{}]++|(?1))*})', s, overlapped=True)] )
# => ['\\lr{secondline\\rl{ right-to-left\n        \\lr{nested left-to-right} end RTL }\n        other text\n}', '\\lr{nested left-to-right}', '\\lr{end line \n}']

See the Python demo and the regex demo.

Note also the overlapped=True option used with regex.finditer that allows matching nested occurrences.

Details:

  • \\lr - \lr string
  • (\{(?:[^{}]++|(?1))*}) - Group 1 (defined to be referred to while recursing):
    • \{ - a { char
    • (?:[^{}]++|(?1))* - zero or more repetitions of
    • [^{}]++ - one or more chars other than { and } without the possibity to re-match the text again in case backtracking is triggered (i.e. it is matched possessively)
    • | - or
    • (?1) - Group 1 pattern recursed
    • } - a } char.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文