使用正则表达式向前跳过所有字符，直到使用负向查找找到特定的字母序列

发布于 2024-09-10 21:14:43 字数 746 浏览 4 评论 0原文

我对基本的正则表达式没问题，但是我对 pos/neg 向前/向后看有点迷失。

我试图从中提取 id #：

[keyword stuff=otherstuff id=123 morestuff=stuff]

之前或之后可能有无限数量的“东西”。我一直在使用正则表达式教练来帮助调试我所尝试的内容，但我不再继续前进......

到目前为止我有这个：

\[keyword (?:id=([0-9]+))?[^\]]*\]

它负责 id 之后的任何额外属性，但我不能弄清楚如何忽略关键字和 ID 之间的所有内容。我知道我不能去[^id]* 我相信我需要使用像这样的负前瞻 (?!id)* 但我想因为它是零宽度，所以它不会从那里向前移动。这也不起作用：

\[keyword[A-z0-9 =]*(?!id)(?:id=([0-9]+))?[^\]]*\]

我一直在寻找示例，但没有找到任何示例。或者也许我有，但它们超出了我的理解范围，我什至没有意识到它们是什么。

帮助！谢谢。

编辑：它也必须匹配 [keyword stuff=otherstuff]，其中 id= 根本不存在，所以我必须在 id # 组上有 1 或 0。还有其他我不想匹配的 [otherkeywords id=32]。该文档需要使用 preg_match_all 匹配整个文档中的多个 [keyword id=3]。

原文

I'm alright with basic regular expressions, but I get a bit lost around pos/neg look aheads/behinds.

I'm trying to pull the id # from this:

[keyword stuff=otherstuff id=123 morestuff=stuff]

There could be unlimited amounts of "stuff" before or after.
I've been using The Regex Coach to help debug what I've tried, but I'm not moving forward anymore...

So far I have this:

\[keyword (?:id=([0-9]+))?[^\]]*\]

Which takes care of any extra attributes after the id, but I can't figure out how to ignore everything between keyword and id.
I know I can't go [^id]*
I believe I need to use a negative lookahead like this (?!id)* but I guess since it's zero-width, it doesn't move forward from there.
This doesn't work either:

\[keyword[A-z0-9 =]*(?!id)(?:id=([0-9]+))?[^\]]*\]

I've been looking all over for examples, but haven't found any. Or perhaps I have, but they went so far over my head I didn't even realize what they were.

Help!
Thanks.

EDIT:
It has to match [keyword stuff=otherstuff] as well, where id= doesn't exist at all, so I have to have a 1 or 0 on the id # group. There are also other [otherkeywords id=32] which I do not want to match. The document needs to match multiple [keyword id=3] throughout the documents using preg_match_all.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不美如何 2024-09-17 21:14:43

不需要前瞻/后瞻：

/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/

添加结尾“[^]]*]”来检查真正的标签结束，可能是不必要的。

编辑：将 \b 添加到 id，否则它可能匹配 [keyword you-dont-want-this-guid=123123-132123-123 id=123]

$ php -r 'preg_match_all("/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/","[keyword stuff=otherstuff morestuff=stuff]",$matches);var_dump($matches);'
array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(42) "[keyword stuff=otherstuff morestuff=stuff]"
  }
  [1]=>
  array(1) {
    [0]=>
    string(0) ""
  }
}
$ php -r 'var_dump(preg_match_all("/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/","[keyword stuff=otherstuff id=123 morestuff=stuff]",$matches),$matches);'
int(1)
array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(49) "[keyword stuff=otherstuff id=123 morestuff=stuff]"
  }
  [1]=>
  array(1) {
    [0]=>
    string(3) "123"
  }
}

No lookahead/behind required:

/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/

Added the ending '[^]]*]' to check for a real tag end, could be unnecessary.

Edit: added the \b to id as otherwise it could match [keyword you-dont-want-this-guid=123123-132123-123 id=123]

$ php -r 'preg_match_all("/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/","[keyword stuff=otherstuff morestuff=stuff]",$matches);var_dump($matches);'
array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(42) "[keyword stuff=otherstuff morestuff=stuff]"
  }
  [1]=>
  array(1) {
    [0]=>
    string(0) ""
  }
}
$ php -r 'var_dump(preg_match_all("/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/","[keyword stuff=otherstuff id=123 morestuff=stuff]",$matches),$matches);'
int(1)
array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(49) "[keyword stuff=otherstuff id=123 morestuff=stuff]"
  }
  [1]=>
  array(1) {
    [0]=>
    string(3) "123"
  }
}

回复收藏 0 原文

意犹 2024-09-17 21:14:43

您不需要向前/向后看。

由于问题被标记为 PHP，因此使用 preg_match_all() 并将匹配存储在 $matches 中。

方法如下：

<?php

  // Store the string. I single quote, in case there are backslashes I
  // didn't see.
$string = 'blah blah[keyword stuff=otherstuff id=123 morestuff=stuff]
           blah blah[otherkeyword stuff=otherstuff id=555 morestuff=stuff]
           blah blah[keyword stuff=otherstuff id=444 morestuff=stuff]';

  // The pattern is '[keyword' followed by not ']' a space and id
  // The space before id is important, so you don't catch 'guid', etc.
  // If '[keyword'  is always at the beginning of a line, you can use
  // '^\[keyword'
$pattern = '/\[keyword[^\]]* id=([0-9]+)/';

  // Find every single $pattern in $string and store it in $matches
preg_match_all($pattern, $string, $matches);

  // The only tricky part you have to know is that each entire match is stored in
  // $matches[0][x], and the part of the match in the parentheses, which is what
  // you want is stored in $matches[1][x]. The brackets are optional, since it's
  // only one line.
foreach($matches[1] as $value)
{     
    echo $value . "<br/>";
}
?>

输出：（

123
444

应该跳过 555）

如果可以使用制表符，您还可以使用 \b 代替文字空格。 \b 表示单词边界...在此单词开头的大小写。

$pattern = '/\[keyword[^\]]*\bid=([0-9]+)/';

You do not need look ahead / behind.

Since the question is tagged PHP, use preg_match_all() and store the match in $matches.

Here's how:

<?php

  // Store the string. I single quote, in case there are backslashes I
  // didn't see.
$string = 'blah blah[keyword stuff=otherstuff id=123 morestuff=stuff]
           blah blah[otherkeyword stuff=otherstuff id=555 morestuff=stuff]
           blah blah[keyword stuff=otherstuff id=444 morestuff=stuff]';

  // The pattern is '[keyword' followed by not ']' a space and id
  // The space before id is important, so you don't catch 'guid', etc.
  // If '[keyword'  is always at the beginning of a line, you can use
  // '^\[keyword'
$pattern = '/\[keyword[^\]]* id=([0-9]+)/';

  // Find every single $pattern in $string and store it in $matches
preg_match_all($pattern, $string, $matches);

  // The only tricky part you have to know is that each entire match is stored in
  // $matches[0][x], and the part of the match in the parentheses, which is what
  // you want is stored in $matches[1][x]. The brackets are optional, since it's
  // only one line.
foreach($matches[1] as $value)
{     
    echo $value . "<br/>";
}
?>

Output:

123
444

( 555 is skipped, as it should be)

You can also use \b instead of a literal space if there could be a tab instead. \b represents a word boundary... in this case the beginning of a word.

$pattern = '/\[keyword[^\]]*\bid=([0-9]+)/';

回复收藏 0 原文

天涯沦落人 2024-09-17 21:14:43

我认为这就是您的意思：（

\[keyword(?:\s+(?!id\b)[A-Za-z]+=[^\]\s]+)*(?:\s+id=([0-9]+))?[^\]]*\]

我假设属性名称只能包含 ASCII 字母，而值可以包含除 ] 之外的任何非空白字符。）

(? :\s+(?!id\b)[A-Za-z]+=[^\]\s]+)* 匹配任意数量的 attribute=value 对（并且它们前面的空格），只要属性名称不是 id。 \b（字边界）的存在是为了防止属性名称以 id 开头，例如 idiocy。这次不需要在属性名称前面放置 \b ，因为您知道它匹配的任何名称前面都会有空格。但是，正如您所知，在这种情况下，前瞻方法就显得有些过分了。

现在，关于这个：

[A-z0-9 =]

Az 要么是拼写错误，要么是错误。如果您希望它匹配所有大写和小写字母，那么它确实可以。但它也匹配

'[', ']', '^', '_', '`` and '\'

......因为它们的代码点位于大写字母和小写字母的代码点之间。也就是说，ASCII 字母。

I think this is what you're getting at:

\[keyword(?:\s+(?!id\b)[A-Za-z]+=[^\]\s]+)*(?:\s+id=([0-9]+))?[^\]]*\]

(I'm assuming attribute names can only contain ASCII letters, while the values can contain any non-whitespace character except ].)

(?:\s+(?!id\b)[A-Za-z]+=[^\]\s]+)* matches any number of attribute=value pairs (and the whitespace preceding them), as long as the attribute name isn't id. The \b (word boundary) is there just in case there are attribute names that start with id, like idiocy. There's no need to put a \b in front of the attribute name this time, because you know any name it matches will be preceded by whitespace. But, as you've learned, the lookahead approach is overkill in this case.

Now, about this: