捕获正则表达式中的重复组

发布于 2025-01-15 13:00:40 字数 776 浏览 1 评论 0原文

类似于这个问题我想捕获一个重复多次的组。但是,我不想使用 findall,因为我使用正则表达式的评估顺序。

我的问题- 我想解析如下所示的参数 -

"(a, {b, c, d}, e)"  # arguments are 1: "a", 2: "b, c, d", 3: "e"
"({a, b}, c, {d, e}, f)" # arguments are 1: "a, b", 2: c, 3: "d, e"

ext.
参数由逗号分隔,但一对大括号的上下文是单个参数。
这是我尝试编写的正则表达式 -

_SingleArg = "(?:(\{.+?\})|(.+?))"

ArgsParse = re.compile(f"(?:{_SingleArg}, )*{_SingleArg}?$")

_SingleArg 变量尝试匹配括号内的完整参数,如果失败,它会尝试匹配常规参数。

我想不出用 findall 来做到这一点的方法。我可以通过运行多个正则表达式来做到这一点 - 首先找到大括号内的参数,然后用空字符串替换它们,最后用逗号分隔。但这是一个非常不优雅的解决方案,特别是因为我也想知道参数的顺序。
有没有更好的方法使用正则表达式来做到这一点?
谢谢,

Similarly to this question I want to capture a group that repeats more than once. However, I don't want to use findall, because Im using the order of the evaluation of the regex.

My issue -
I want to parse arguments that look like this -

"(a, {b, c, d}, e)"  # arguments are 1: "a", 2: "b, c, d", 3: "e"
"({a, b}, c, {d, e}, f)" # arguments are 1: "a, b", 2: c, 3: "d, e"

ext.
The arguments are separated by commas, but the contexts of a pair of curly brackets is a single argument.
This is the regex I tried to write -

_SingleArg = "(?:(\{.+?\})|(.+?))"

ArgsParse = re.compile(f"(?:{_SingleArg}, )*{_SingleArg}?
quot;)

The _SingleArg variable tries to match a full argument within brackets, and if it fails it tries matching a regular argument.

I can't think of a way to do this with findall. I can do it by running multiple regular expressions - first finding the arguments within braces, and then replacing them with the empty string, and finally splitting by comma. But this is a very inelegant solution, especially since I want to know the order of the arguments as well.
Is there a better way to do this with regular expressions?
Thanks,

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

只为一人 2025-01-22 13:00:40

您可以使用此模式和方法来保留参数的顺序:

模式: \w+|\{([\w, ]+)\}

代码:

pattern = r"\w+|\{([\w, ]+)\}"
test_string = "({a, b}, c, {d, e}, f)"

result = [(x, y.group().strip('{}')) for x, y in enumerate(re.finditer(pattern, test_string), start=1)]
print(result)

输出:

[(1, 'a, b'), (2, 'c'), (3, 'd, e'), (4, 'f')]

You can use this pattern and method to preserve the order of argument:

Pattern: \w+|\{([\w, ]+)\}

Code:

pattern = r"\w+|\{([\w, ]+)\}"
test_string = "({a, b}, c, {d, e}, f)"

result = [(x, y.group().strip('{}')) for x, y in enumerate(re.finditer(pattern, test_string), start=1)]
print(result)

Output:

[(1, 'a, b'), (2, 'c'), (3, 'd, e'), (4, 'f')]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文