分组正则表达式以匹配*有时*以空格开头的行?
正则表达式风格:wxRegEx。
我正在尝试创建一个“分组”正则表达式,它匹配有时以空格开头的字符串。当它不以空格开头时,它以目标组开头(以下示例中的第二个带括号的表达式)。这是一个相对简单的行,由一些可预测的标记和任意文本的一部分组成,例如
"good: Sed ut perspiciatis unde omnis iste natus error "
"better: Sit voluptatem accusantium doloremque laudantium "
"best: Nemo enim ipsam voluptatem quia voluptas "
" ok: Sit voluptatem accusantium doloremque laudantium "
注意:引用的字符不是我输入的一部分。通过在我的帖子中引入引号,我试图使每行/字符串的边界更加清晰。
我想出的以“分组”方式匹配上述内容的正则表达式(即我可以单独处理每个组以进行进一步处理)是:
(^\s*)(good|better|best|ok)(: )(.*)( $)
注意: \s 是 wxRegEx 的类简写转义[[:空间:]]。
问题是这个正则表达式仅当该行实际上以空格开头时才有效。为什么? '\s' 后面的 '*' 不是表示“0 次或多次出现 \s”吗?
我知道我在这里遗漏了一些基本的东西,但它是什么?
RegEx flavor: wxRegEx.
I am trying to create a "grouped" regex that matches a string that sometimes begins with a whitespace. When it doesn't begin with a whitespace, it begins with the target group (second parenthesized expression in the following sample). It is a relatively simple line made of a few predictable tokens and one portion of arbitrary text, e.g.
"good: Sed ut perspiciatis unde omnis iste natus error "
"better: Sit voluptatem accusantium doloremque laudantium "
"best: Nemo enim ipsam voluptatem quia voluptas "
" ok: Sit voluptatem accusantium doloremque laudantium "
Note: The quoted characters are not part of my input. By introducing the quotes in my posting I am trying to make the boundaries of each line/string clearer.
The regex that I came up with to match the above in a "grouped" manner (i.e. that I can address each group separately for further processing) is:
(^\s*)(good|better|best|ok)(: )(.*)( $)
Note: \s is wxRegEx's class-shorthand escape for [[:space:]].
The problem is that this regex works only when the line actually begins with a space. Why? doesn't the '*' right after '\s' mean "0 or more occurrences of \s" ?
I know I am missing something fundamental here, but what is it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您是否尝试过使用
(^ *)
而不是(^\s*)
?您是否可能对\s
语法有误?我自己也不知道wxRegEx。Have you tried this with
(^ *)
instead of(^\s*)
? Is it possible you're wrong about the\s
syntax? I don't know wxRegEx myself.我不熟悉 wxRegEx,但如果它是 PCRE,我想你可能想要 (^\s*)?(good|...
'?' 修改整个零或多个捕获以使其为零-或一。
I'm not familiar with wxRegEx, but if it is PCRE, I think you may want (^\s*)?(good|...
The '?' modifies the entire zero-or-more capture to make it zero-or-one.
这很奇怪.. 你是对的,* 应该匹配 0 次或多次出现...将插入符号 (^) 移到组外有什么区别吗?
That's weird.. you are right that * should match 0 or more occurrences... Does moving the caret (^) outside the group make any difference?
我在你的正则表达式中没有看到明显的错误。当然,您对
*
的解释也是正确的。你的表达中可能有一些实际的空格吗?空格(如 -><- )在正则表达式中没有特殊含义,引擎会尝试匹配它。如果您的第一个捕获组看起来像
(^ \s*)
这将具有您所描述的效果。I see no obvious error in your regex. Your interpretation of the
*
is also correct, of course. Do you maybe have some actual spaces in your expression? The space ( like -><- ) has no special meaning in regex and the engine will try to match it. If your first capturing group looked like
(^ \s*)
this would have the effect you describe.