分组正则表达式以匹配有时以空格开头的行？

发布于 2024-10-13 08:25:58 字数 836 浏览 4 评论 0原文

正则表达式风格：wxRegEx。

我正在尝试创建一个“分组”正则表达式，它匹配有时以空格开头的字符串。当它不以空格开头时，它以目标组开头（以下示例中的第二个带括号的表达式）。这是一个相对简单的行，由一些可预测的标记和任意文本的一部分组成，例如

"good: Sed ut perspiciatis unde omnis iste natus error "
"better: Sit voluptatem accusantium doloremque laudantium "
"best: Nemo enim ipsam voluptatem quia voluptas "
" ok: Sit voluptatem accusantium doloremque laudantium "

注意：引用的字符不是我输入的一部分。通过在我的帖子中引入引号，我试图使每行/字符串的边界更加清晰。

我想出的以“分组”方式匹配上述内容的正则表达式（即我可以单独处理每个组以进行进一步处理）是：

(^\s*)(good|better|best|ok)(: )(.*)( $)

注意： \s 是 wxRegEx 的类简写转义[[：空间：]]。

问题是这个正则表达式仅当该行实际上以空格开头时才有效。为什么？ '\s' 后面的 '*' 不是表示“0 次或多次出现 \s”吗？

我知道我在这里遗漏了一些基本的东西，但它是什么？

原文

RegEx flavor: wxRegEx.

I am trying to create a "grouped" regex that matches a string that sometimes begins with a whitespace. When it doesn't begin with a whitespace, it begins with the target group (second parenthesized expression in the following sample). It is a relatively simple line made of a few predictable tokens and one portion of arbitrary text, e.g.

"good: Sed ut perspiciatis unde omnis iste natus error "
"better: Sit voluptatem accusantium doloremque laudantium "
"best: Nemo enim ipsam voluptatem quia voluptas "
" ok: Sit voluptatem accusantium doloremque laudantium "

Note: The quoted characters are not part of my input. By introducing the quotes in my posting I am trying to make the boundaries of each line/string clearer.

The regex that I came up with to match the above in a "grouped" manner (i.e. that I can address each group separately for further processing) is:

(^\s*)(good|better|best|ok)(: )(.*)( $)

Note: \s is wxRegEx's class-shorthand escape for [[:space:]].

The problem is that this regex works only when the line actually begins with a space. Why? doesn't the '*' right after '\s' mean "0 or more occurrences of \s" ?

I know I am missing something fundamental here, but what is it?

分享到QQ

分享到微博