使用明确编号的重复而不是问号、星号和加号
我见过使用显式编号重复而不是 ?
、*
和 +
的正则表达式模式,即:
Explicit Shorthand
(something){0,1} (something)?
(something){1} (something)
(something){0,} (something)*
(something){1,} (something)+
问题是:
- 这两种形式吗完全相同的?如果添加所有格/不情愿修饰语怎么办?
- 如果它们相同,哪一个更惯用?更具可读性?仅仅是“更好”吗?
I've seen regex patterns that use explicitly numbered repetition instead of ?
, *
and +
, i.e.:
Explicit Shorthand
(something){0,1} (something)?
(something){1} (something)
(something){0,} (something)*
(something){1,} (something)+
The questions are:
- Are these two forms identical? What if you add possessive/reluctant modifiers?
- If they are identical, which one is more idiomatic? More readable? Simply "better"?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
据我所知,它们是相同的。我认为可能有一些引擎不支持编号语法,但我不确定是哪一个。我依稀记得几天前关于 SO 的一个问题,其中显式符号在 Notepad++ 中不起作用。
我唯一会使用明确编号的重复是当重复大于 1 时:
{2}
{2,}
{2,4}
我倾向于更喜欢这些,尤其是当重复模式超过几个字符时。如果必须匹配3个数字,有些人喜欢写:
\d\d\d
,但我宁愿写\d{3}
,因为它强调了数字的数量涉及重复。此外,如果该数字需要更改,我只需将{3}
更改为{n}
而不必重新解析我头脑中的正则表达式或担心把事情搞砸;它需要更少的脑力劳动。如果不满足该标准,我更喜欢速记。使用“显式”符号很快就会使模式变得混乱并且难以阅读。我参与过一个项目,其中一些开发人员不太了解正则表达式(这并不是每个人最喜欢的主题),我看到了很多
{1}
和{0,1}
出现次数。有些人会要求我对他们的模式进行代码审查,那时我会建议将这些事件更改为速记符号并节省空间,并且在我看来,提高可读性。To my knowledge they are identical. I think there maybe a few engines out there that don't support the numbered syntax but I'm not sure which. I vaguely recall a question on SO a few days ago where explicit notation wouldn't work in Notepad++.
The only time I would use explicitly numbered repetition is when the repetition is greater than 1:
{2}
{2,}
{2,4}
I tend to prefer these especially when the repeated pattern is more than a few characters. If you have to match 3 numbers, some people like to write:
\d\d\d
but I would rather write\d{3}
since it emphasizes the number of repetitions involved. Furthermore, down the road if that number ever needs to change, I only need to change{3}
to{n}
and not re-parse the regex in my head or worry about messing it up; it requires less mental effort.If that criteria isn't met, I prefer the shorthand. Using the "explicit" notation quickly clutters up the pattern and makes it hard to read. I've worked on a project where some developers didn't know regex too well (it's not exactly everyone's favorite topic) and I saw a lot of
{1}
and{0,1}
occurrences. A few people would ask me to code review their pattern and that's when I would suggest changing those occurrences to shorthand notation and save space and, IMO, improve readability.我可以看出,如果您有一个执行大量有界重复的正则表达式,那么为了可读性,您可能希望一致使用
{n,m}
形式。例如:但我不记得在现实生活中见过这样的案例。当我看到问题中使用
{0,1}
、{0,}
或{1,}
时,实际上总是会这样做出于无知。而在回答此类问题的过程中,我们还应该建议他们使用?
、*
或+
来代替。当然,
{1}
纯粹是混乱。有些人似乎有一个模糊的概念,认为它的意思是“唯一的一个”——毕竟,它一定意味着某种东西,对吧?为什么这样一种病态的简洁语言会支持一个占据整个三个字符并且什么也不做的结构?据我所知,它的唯一合法用途是隔离后跟文字数字的反向引用(例如\1{1}0
),但还有其他方法可以做到这一点。I can see how, if you have a regex that does a lot of bounded repetition, you might want to use the
{n,m}
form consistently for readability's sake. For example:But I can't recall ever seeing such a case in real life. When I see
{0,1}
,{0,}
or{1,}
being used in a question, it's virtually always being done out of ignorance. And in the process of answering such a question, we should also suggest that they use the?
,*
or+
instead.And of course,
{1}
is pure clutter. Some people seem to have a vague notion that it means "one and only one"--after all, it must mean something, right? Why would such a pathologically terse language support a construct that takes up a whole three characters and does nothing at all? Its only legitimate use that I know of is to isolate a backreference that's followed by a literal digit (e.g.\1{1}0
), but there are other ways to do that.除非您使用特殊的正则表达式引擎,否则它们都是相同的。但是,并非所有正则表达式引擎都支持编号重复、
?
或+
。如果它们全部可用,我会使用字符而不是数字,只是因为它对我来说更直观。
如果它们全部可用,
They're all identical unless you're using an exceptional regex engine. However, not all regex engines support numbered repetition,
?
or+
.If all of them are available, I'd use characters rather than numbers, simply because it's more intuitive for me.
它们是等效的(并且您可以通过测试您的上下文来了解它们是否可用。)
我预计的问题是您可能不是唯一需要使用您的代码的人。
正则表达式对于大多数人来说已经足够困难了。每当有人使用不寻常的语法时,问题
出现:“他们为什么不按照标准方式做?他们认为我错过了什么?”
They're equivalent (and you'll find out if they're available by testing your context.)
The problem I'd anticipate is when you may not be the only person ever needing to work with your code.
Regexes are difficult enough for most people. Anytime someone uses an unusual syntax, the question
arises: "Why didn't they do it the standard way? What were they thinking that I'm missing?"