正则表达式和 PHP 用于提取具有多个换行符的标签之间的内容

发布于 2024-09-04 15:40:51 字数 611 浏览 11 评论 0原文

如何提取具有多个换行符的标签之间的内容？

我是正则表达式的新手，想知道如何处理未知数量的换行符以匹配我的查询。

任务：提取

和第一个结束

标记之间的内容。

原始来源：

<div class="test">optional text<br/>
content<br/>
<br/>
content<br/>
...
content<br/><a href="/url/">Hyperlink</a></div></div></div>

我已经制定了下面的正则表达式，

/<div class=\"test\">(.*?)<br\/>(.*?)<\/div>/

只是想知道如何使用正则表达式匹配多个换行符。

我们有 DOM，但我对此并不熟悉。

原文

How can I extract the content between tags with several line breaks?

I'm a newbie to regex, who would like to know how to handle unknown numbers of line break to match my query.

Task: Extract content between <div class="test"> and the first closing </div> tag.

Original source:

<div class="test">optional text<br/>
content<br/>
<br/>
content<br/>
...
content<br/><a href="/url/">Hyperlink</a></div></div></div>

I've worked out the below regex,

/<div class=\"test\">(.*?)<br\/>(.*?)<\/div>/

Just wonder how to match several line breaks using regex.

There is DOM for us but I am not familiar with that.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夏末染殇 2024-09-11 15:40:51

您不应该使用正则表达式解析 (x)html。使用 DOM。

我是 xpath 的初学者，但是这样的应该可以工作：

//div[@class='test']

这会选择所有带有“test”类的 div。您需要将 html 加载到 DOMDocument 对象中，然后创建与之相关的 DOMXpath 对象，并调用其 execute() 方法来获取结果。它将返回一个 DOMNodeList 对象。
最终代码如下所示：

$domd = new DOMDocument();
$domd->loadHTML($your_html_code);
$domx = new DOMXPath($domd);
$items = $domx->execute("//div[@class='test']");

在此之后，您的 div 位于 $items->item(0) 中。

这是未经测试的代码，但如果我没记错的话，它应该可以工作。

更新，忘记您需要该内容。

如果您需要文本内容（无标签），只需调用$items->item(0)->textContent即可。如果您还需要这些标签，这里相当于 PHP DOM 的 javascript 的innerHTML：

function innerHTML($node){
  $doc = new DOMDocument();
  foreach ($node->childNodes as $child)
    $doc->appendChild($doc->importNode($child, true));

  return $doc->saveHTML();
}

使用 $items->item(0) 作为参数调用它。

You should not parse (x)html with regular expressions. Use DOM.

I'm a beginner in xpath, but one like this should work:

//div[@class='test']

This selects all divs with the class 'test'. You will need to load your html into a DOMDocument object, then create a DOMXpath object relating to that, and call its execute() method to get the results. It will return a DOMNodeList object.
Final code looks something like this:

$domd = new DOMDocument();
$domd->loadHTML($your_html_code);
$domx = new DOMXPath($domd);
$items = $domx->execute("//div[@class='test']");

After this, your div is in $items->item(0).

This is untested code, but if I remember correctly, it should work.

Update, forgot that you need the content.

If you need the text content (no tags), you can simply call $items->item(0)->textContent. If you also need the tags, here's the equivalent of javascript's innerHTML for PHP DOM:

function innerHTML($node){
  $doc = new DOMDocument();
  foreach ($node->childNodes as $child)
    $doc->appendChild($doc->importNode($child, true));

  return $doc->saveHTML();
}

Call it with $items->item(0) as the parameter.

回复收藏 0 原文