正则表达式 - 贪婪 - 匹配 HTML 标签、内容和属性
我正在尝试匹配 HTML 源中的特定 span 标签。
lang 属性和标签的内部 HTML 用作返回新字符串的函数的参数。
我想用被调用函数的结果替换旧的标签、属性和内容。
主题将是这样的:
<p>Some codesnippet:</p>
<span lang="fsharp">// PE001
let p001 = [0..999]
|> List.filter (fun n -> n % 3 = 0 || n % 5 = 0)
|> List.sum
</span>
<p>Another code snippet:</p>
<span lang="C#">//C# testclass
class MyClass {
}
</span>
为了提取 lang 属性的值和内容,我使用以下表达式对这些值进行分组:
/(<span lang="(.*)">(.*)</span>)/is
由于正则表达式往往是贪婪的,因此该表达式匹配完整的主题,而不仅仅是一个范围 -标签及其内容。
我如何设法只匹配一个跨度标签?
I am trying to match specific span-tags from an HTML source.
The lang-attribute and the inner HTML of the tag are used as parameters for a function which returns a new string.
I want replace the old tags, attributes and content with the result of the called function.
The subject would be something like this:
<p>Some codesnippet:</p>
<span lang="fsharp">// PE001
let p001 = [0..999]
|> List.filter (fun n -> n % 3 = 0 || n % 5 = 0)
|> List.sum
</span>
<p>Another code snippet:</p>
<span lang="C#">//C# testclass
class MyClass {
}
</span>
In order to extract the value of the lang attribute and the content, I group those values with the following expression:
/(<span lang="(.*)">(.*)</span>)/is
Since regex tends to be greedy, This expression matches the complete subject, not just one span-tag and its content.
How do i manage to match just one span-tag?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我们永远不会再重复它:不要使用正则表达式来处理 HTML!
Instead, use [**`DOMDocument::loadHTML`**][1].
它将允许您使用 DOM 来操作 HTML 数据,这更强大且更容易:您将能够:
getElementById
和getElementsByTagName
用于简单提取,DOMXPath
类对文档进行 XPath 查询getAttribute
/setAttribute
真的:花时间学习 DOM:这是一笔巨大的投资!
We'll never reapeat it again : do not use regular expressions to work with HTML !
Instead, use [**`DOMDocument::loadHTML`**][1].
It'll allow you to manipulate your HTML data using the DOM, which is much more powerful and easier : you'll be able to :
getElementById
andgetElementsByTagName
for simple extractions,DOMXPath
class to make XPath queries on your documentDOMElement
s, and methods such asgetAttribute
/setAttribute
Really : take the time to learn DOM : it's a great investment !
将其指定为非贪婪
您可以使用
?
/((.*?)<\/span>)/is< /code>
或使用 PCRE_UNGREEDY 修饰符
/((.*)<\/span>)/Uis
You can specify it to be ungreedy using
?
/(<span lang="(.*?)">(.*?)<\/span>)/is
or make all expression ungreedy by default using PCRE_UNGREEDY modifier
/(<span lang="(.*)">(.*)<\/span>)/Uis
我认为只需添加 ?
Just adding ? , I think