正则表达式高级:正向回顾
这是我的测试字符串:
<img rel="{objectid:498,newobject:1,fileid:338}" width="80" height="60" align="left" src="../../../../files/jpg1/Desert1.jpg" alt="" />
我想获取 rel 属性之间的每个 JSON 形成的元素。 它适用于第一个元素(objectid)。
这是我的 ReqEx,它工作正常:
(?<=(rel="\{objectid:))\d+(?=[,|\}])
但我想做这样的事情,但不起作用:
(?<=(rel="\{.*objectid:))\d+(?=[,|\}])
所以我可以解析搜索字符串的每个元素。
我正在使用 Java-ReqEx
This is my test-string:
<img rel="{objectid:498,newobject:1,fileid:338}" width="80" height="60" align="left" src="../../../../files/jpg1/Desert1.jpg" alt="" />
I want to get each of the JSON formed Elements inbetween the rel attribute.
It's working for the first element (objectid).
Here is my ReqEx, which works fine:
(?<=(rel="\{objectid:))\d+(?=[,|\}])
But i want to do somthing like this, which doesn't work:
(?<=(rel="\{.*objectid:))\d+(?=[,|\}])
So i can parse every element of the search string.
I'm using Java-ReqEx
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Java(以及除 .NET 和 JGSoft 之外的几乎所有正则表达式风格)不支持lookbehind 内的无限重复。
您可以改用捕获组。另外,最好使用
[^{]*
而不是.*
,并使用\b
确保单词边界。应该足够了(然后查看捕获组 1 的属性值。
Java (and nearly all regex flavors except .NET and JGSoft) don't support infinite repetition inside lookbehinds.
You could use capturing groups instead. Also, better use
[^{]*
instead of.*
, and ensure word boundaries with\b
.should be sufficient (then look at the capturing group 1 for the value of the attribute.
您想迭代所有键/值对吗?您不需要向后查找:
第一次调用
find()
时,正则表达式的第一部分匹配rel="{
。在后续调用中,第二部分替代 (\G,
) 接管以匹配逗号,但前提是它紧跟在前一个匹配之后。无论哪种情况,它都会让您排队等待(\w+):(\w+。 )
来匹配下一个键/值对,并且它永远不能匹配rel
属性之外的任何地方,我假设您将正则表达式应用于独立的 IMG 标记,就像您一样。发布它,而不是整个 HTML 文件。此外,正则表达式可能需要一些调整以匹配您的实际数据,例如,您可能需要更通用的
([^:]+):([^,} ]+)
而不是(\w+):(\w+)
。Do you want to iterate through all the key/value pairs? You don't need lookbehind for that:
The first time
find()
is called, the first part of the regex matchesrel="{
. On subsequent calls, the second alternative (\G,
) takes over to match a comma, but only if it immediately follows the previous match. In either case it leaves you lined up for(\w+):(\w+)
to match the next key/value pair, and it can never match anywhere outside therel
attribute.I'm assuming you're applying the regex to an isolated IMG tag, as you posted it, not to a whole HTML file. Also, the regex may need a little tweaking to match your actual data. For example, you might want the more general
([^:]+):([^,}]+)
instead of(\w+):(\w+)
.一般情况下,前向和后向可能不包含任意正则表达式:大多数引擎(包括 Java)要求它们的长度是众所周知的,因此您不能在其中使用像
*
这样的量词。无论如何,你为什么在这里使用前瞻和后瞻?只需使用捕获组即可,这要简单得多。
现在第一个捕获组将包含 ID。
Lookaheads and lookbehinds may not contain arbitrary regular expressions in general: Most engines (Java’s included) require that their length is well-known so you can’t use quantifiers like
*
in them.Why are you using lookaheads and lookbehinds here, anyway? Just use capture groups instead, that’s much simpler.
Now the first capture group will contain the ID.