正则表达式 php：查找 div 中的所有内容

发布于 2024-08-03 06:09:56 字数 725 浏览 2 评论 0原文

我正在尝试使用正则表达式查找 div 内的所有内容。我知道可能有一种更聪明的方法来做到这一点 - 但我选择了正则表达式。

所以目前我的正则表达式模式看起来像这样：

$gallery_pattern = '/<div class="gallery">([\s\S]*)<\/div>/';

它确实起到了一定作用。

问题是如果我有两个 div - 就像这样。

<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>

我想从两个 div 中提取信息，但我的问题是，在测试时，我没有得到结果之间的文本，而是：

"text to extract here </div>  
<div class="gallery">text to extract from here as well"

所以总结一下。它会跳过 div 的第一端。并继续下一步。 div 内的文本可以包含 <、/ 和换行符。只是让你知道！

有没有人有一个简单的解决方案来解决这个问题？我仍然是正则表达式新手。

原文

I'm trying to find eveything inside a div using regexp. I'm aware that there probably is a smarter way to do this - but I've chosen regexp.

so currently my regexp pattern looks like this:

$gallery_pattern = '/<div class="gallery">([\s\S]*)<\/div>/';

And it does the trick - somewhat.

The problem is if i have two divs after each other - like this.

<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>

I want to extract the information from both divs, but my problem, when testing, is that im not getting the text in between as a result but instead:

"text to extract here </div>  
<div class="gallery">text to extract from here as well"

So to sum up. It skips the first end of the div. and continues on to the next.
The text inside the div can contain <, / and linebreaks. just so you know!

Does anyone have a simple solution to this problem? Im still a regexp novice.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一影成城 2024-08-10 06:09:56

当有方便的 DOM 库时，你不应该使用正则表达式来解析 HTML：

$str = '
<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>
';

$doc = new DOMDocument();
$doc->loadHTML($str);
$divs = $doc->getElementsByTagName('div');

if ( count($divs ) ) {
    foreach ( $divs as $div ) {
    echo $div->nodeValue . '<br>';
    }
}

You shouldn't be using regex to parse HTML when there's a convenient DOM library:

$str = '
<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>
';

$doc = new DOMDocument();
$doc->loadHTML($str);
$divs = $doc->getElementsByTagName('div');

if ( count($divs ) ) {
    foreach ( $divs as $div ) {
    echo $div->nodeValue . '<br>';
    }
}

回复收藏 0 原文

徒留西风 2024-08-10 06:09:56

像这样的事情怎么样：

$str = <<<HTML
<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>
HTML;

$matches = array();
preg_match_all('#<div[^>]*>(.*?)</div>#s', $str, $matches);

var_dump($matches[1]);

注意“？”在正则表达式中，所以它是“不贪婪的”。

这会让你：

array
  0 => string 'text to extract here' (length=20)
  1 => string 'text to extract from here as well' (length=33)

这应该可以正常工作......如果你没有叠瓦式div；如果你这样做...那么...实际上：你真的确定要使用理性表达式来解析 HTML，而 HTML 本身并不是那么理性？

What about something like this :

$str = <<<HTML
<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>
HTML;

$matches = array();
preg_match_all('#<div[^>]*>(.*?)</div>#s', $str, $matches);

var_dump($matches[1]);

Note the '?' in the regex, so it is "not greedy".

Which will get you :

array
  0 => string 'text to extract here' (length=20)
  1 => string 'text to extract from here as well' (length=33)

This should work fine... If you don't have imbricated divs ; if you do... Well... actually : are you really sure you want to use rational expressions to parse HTML, which is quite not that rational itself ?

回复收藏 0 原文

~没有更多了~