使用明确编号的重复而不是问号、星号和加号

发布于 2024-09-05 16:09:50 字数 371 浏览 9 评论 0原文

我见过使用显式编号重复而不是 ?、* 和 + 的正则表达式模式，即：

Explicit            Shorthand
(something){0,1}    (something)?
(something){1}      (something)
(something){0,}     (something)*
(something){1,}     (something)+

问题是：

这两种形式吗完全相同的？如果添加所有格/不情愿修饰语怎么办？
如果它们相同，哪一个更惯用？更具可读性？仅仅是“更好”吗？

原文

I've seen regex patterns that use explicitly numbered repetition instead of ?, * and +, i.e.:

Explicit            Shorthand
(something){0,1}    (something)?
(something){1}      (something)
(something){0,}     (something)*
(something){1,}     (something)+

The questions are:

Are these two forms identical? What if you add possessive/reluctant modifiers?
If they are identical, which one is more idiomatic? More readable? Simply "better"?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟花肆意 2024-09-12 16:09:50

据我所知，它们是相同的。我认为可能有一些引擎不支持编号语法，但我不确定是哪一个。我依稀记得几天前关于 SO 的一个问题，其中显式符号在 Notepad++ 中不起作用。

我唯一会使用明确编号的重复是当重复大于 1 时：

恰好两个：{2}
两个或更多：{2,}
两到四： {2,4}

我倾向于更喜欢这些，尤其是当重复模式超过几个字符时。如果必须匹配3个数字，有些人喜欢写：\d\d\d，但我宁愿写\d{3}，因为它强调了数字的数量涉及重复。此外，如果该数字需要更改，我只需将 {3} 更改为 {n} 而不必重新解析我头脑中的正则表达式或担心把事情搞砸；它需要更少的脑力劳动。

如果不满足该标准，我更喜欢速记。使用“显式”符号很快就会使模式变得混乱并且难以阅读。我参与过一个项目，其中一些开发人员不太了解正则表达式（这并不是每个人最喜欢的主题），我看到了很多 {1} 和 {0,1} 出现次数。有些人会要求我对他们的模式进行代码审查，那时我会建议将这些事件更改为速记符号并节省空间，并且在我看来，提高可读性。

回复收藏 0 原文

冷心人i 2024-09-12 16:09:50

我可以看出，如果您有一个执行大量有界重复的正则表达式，那么为了可读性，您可能希望一致使用 {n,m} 形式。例如：

/^
 abc{2,5}
 xyz{0,1}
 foo{3,12}
 bar{1,}
 $/x

但我不记得在现实生活中见过这样的案例。当我看到问题中使用 {0,1}、{0,} 或 {1,} 时，实际上总是会这样做出于无知。而在回答此类问题的过程中，我们还应该建议他们使用?、*或+来代替。

当然，{1} 纯粹是混乱。有些人似乎有一个模糊的概念，认为它的意思是“唯一的一个”——毕竟，它一定意味着某种东西，对吧？为什么这样一种病态的简洁语言会支持一个占据整个三个字符并且什么也不做的结构？据我所知，它的唯一合法用途是隔离后跟文字数字的反向引用（例如 \1{1}0），但还有其他方法可以做到这一点。

I can see how, if you have a regex that does a lot of bounded repetition, you might want to use the {n,m} form consistently for readability's sake. For example:

/^
 abc{2,5}
 xyz{0,1}
 foo{3,12}
 bar{1,}
 $/x

But I can't recall ever seeing such a case in real life. When I see {0,1}, {0,} or {1,} being used in a question, it's virtually always being done out of ignorance. And in the process of answering such a question, we should also suggest that they use the ?, * or + instead.

And of course, {1} is pure clutter. Some people seem to have a vague notion that it means "one and only one"--after all, it must mean something, right? Why would such a pathologically terse language support a construct that takes up a whole three characters and does nothing at all? Its only legitimate use that I know of is to isolate a backreference that's followed by a literal digit (e.g. \1{1}0), but there are other ways to do that.

回复收藏 0 原文