PCRE:(+) 和 (-) 向前/向后(正则表达式)
我有以下字符串:
<A href="CarPage.asp?parent=CAR123+++&Color=RED">The Car is Red - Its Fast</a>
我想提取:
- CAR123
- RED
- 汽车是红色的 - 速度很快
到目前为止我所拥有的是:
(?<=<A href="CarPage\.asp\?parent=)[A-Za-z0-9]*(\+\+\+&Color=)[A-Za-z0-9]{3}(\">)[A-Za-z0-9\- ]*(?=</a>)
但我不确定当它们不在时如何设置正向和负向前向和后向字符串边界。
我知道,它是 HTML...我以前听说过...“不要用正则表达式解析 html...” 我不需要比这更复杂的东西。
感谢帮助。
谢谢!
I have the following string:
<A href="CarPage.asp?parent=CAR123+++&Color=RED">The Car is Red - Its Fast</a>
And I want to extract:
- CAR123
- RED
- The Car is Red - Its Fast
What I have so far is:
(?<=<A href="CarPage\.asp\?parent=)[A-Za-z0-9]*(\+\+\+&Color=)[A-Za-z0-9]{3}(\">)[A-Za-z0-9\- ]*(?=</a>)
But I'm not sure how to set up positive and negative lookahead and lookbehinds when they are not on the string boundaries.
I know, it's HTML...I've heard it before... "Don't parse html with regex..."
I don't need anything more elaborate than this.
Help is appreciated.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您不需要那么复杂的东西,您可能可以这样做:
然后将部件从
$1
、$2
和$3
中取出>。您可能需要稍微收紧.*
部分,具体取决于您的实际输入的变化程度。例如,Perl 的这段:
输出:
You don't need anything that complicated, you can probably get away with this:
And then pull the parts out of
$1
,$2
, and$3
. You might have to tighten up the.*
parts a bit depending on how variable your real input is.For example, this bit of Perl:
Outputs:
最好使用解析器,但如果您的链接始终以完全相同的方式格式化(没有 id、类、额外参数、不同顺序的参数等,请尝试:
与 Mu 的建议不同的是贪婪。
Better use a parser, but if your link is always formatted in the exact same way (no ids, classes, extra params, params in a different order, etc, try:
The different with Mu's suggestion is the greediness.