PCRE：(+) 和 (-) 向前/向后（正则表达式）

发布于 2024-11-09 17:50:10 字数 519 浏览 9 评论 0原文

我有以下字符串：

<A href="CarPage.asp?parent=CAR123+++&Color=RED">The Car is Red - Its Fast</a>

我想提取：

CAR123
RED
汽车是红色的 - 速度很快

到目前为止我所拥有的是：

(?<=<A href="CarPage\.asp\?parent=)[A-Za-z0-9]*(\+\+\+&Color=)[A-Za-z0-9]{3}(\">)[A-Za-z0-9\- ]*(?=</a>)

但我不确定当它们不在时如何设置正向和负向前向和后向字符串边界。

我知道，它是 HTML...我以前听说过...“不要用正则表达式解析 html...” 我不需要比这更复杂的东西。

感谢帮助。

谢谢！

原文

I have the following string:

<A href="CarPage.asp?parent=CAR123+++&Color=RED">The Car is Red - Its Fast</a>

And I want to extract:

CAR123
RED
The Car is Red - Its Fast

What I have so far is:

(?<=<A href="CarPage\.asp\?parent=)[A-Za-z0-9]*(\+\+\+&Color=)[A-Za-z0-9]{3}(\">)[A-Za-z0-9\- ]*(?=</a>)

But I'm not sure how to set up positive and negative lookahead and lookbehinds when they are not on the string boundaries.

I know, it's HTML...I've heard it before... "Don't parse html with regex..."
I don't need anything more elaborate than this.

Help is appreciated.

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

独自唱情﹋歌 2024-11-16 17:50:10

您不需要那么复杂的东西，您可能可以这样做：

/parent=(\w+).*Color=(\w+).*>(.*)</

然后将部件从 $1、$2 和 $3 中取出>。您可能需要稍微收紧 .* 部分，具体取决于您的实际输入的变化程度。

例如，Perl 的这段：

my $s = '<A href="CarPage.asp?parent=CAR123+++&Color=RED">The Car is Red - Its Fast</a>';
$s =~ /parent=(\w+).*Color=(\w+).*>(.*)</;
print join("\n", $1, $2, $3), "\n";

输出：

CAR123
RED
The Car is Red - Its Fast

You don't need anything that complicated, you can probably get away with this:

/parent=(\w+).*Color=(\w+).*>(.*)</

And then pull the parts out of $1, $2, and $3. You might have to tighten up the .* parts a bit depending on how variable your real input is.

For example, this bit of Perl:

my $s = '<A href="CarPage.asp?parent=CAR123+++&Color=RED">The Car is Red - Its Fast</a>';
$s =~ /parent=(\w+).*Color=(\w+).*>(.*)</;
print join("\n", $1, $2, $3), "\n";

Outputs:

CAR123
RED
The Car is Red - Its Fast

回复收藏 0 原文

作妖 2024-11-16 17:50:10

最好使用解析器，但如果您的链接始终以完全相同的方式格式化（没有 id、类、额外参数、不同顺序的参数等，请尝试：

parent=(\w+?)\+*&Color=(\w+?)">(.*?)<

与 Mu 的建议不同的是贪婪。

Better use a parser, but if your link is always formatted in the exact same way (no ids, classes, extra params, params in a different order, etc, try: