当前位置：文江博客话题详情

使用 preg_match_all PHP 限制结果数量

发布于 2024-10-08 06:10:31 字数 163 浏览 2 评论 0原文

有没有办法限制使用 preg_match_all 返回的匹配项数量？

例如，我只想匹配网页上的前 20 个

标记，但有 100 个

标记。

干杯

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

℉服软 2024-10-15 06:10:31

$matches = array();   
preg_match_all ( $pattern , $subject , $matches );
$twenty = array_slice($matches , 0, 20);

$matches = array();   
preg_match_all ( $pattern , $subject , $matches );
$twenty = array_slice($matches , 0, 20);

回复收藏 0 原文

居里长安 2024-10-15 06:10:31

只需匹配所有并对结果数组进行切片即可：

$allMatches = array ();
$numMatches = preg_match_all($pattern, $subject, $allMatches, PREG_SET_ORDER);
$limit = 20;
$limitedResults = $allMatches;
if($numMatches > $limit)
{
   $limitedResults = array_slice($allMatches, 0, $limit);
}

// Use $limitedResults here

Just match all and slice the resulting array:

$allMatches = array ();
$numMatches = preg_match_all($pattern, $subject, $allMatches, PREG_SET_ORDER);
$limit = 20;
$limitedResults = $allMatches;
if($numMatches > $limit)
{
   $limitedResults = array_slice($allMatches, 0, $limit);
}

// Use $limitedResults here

回复收藏 0 原文

没企图 2024-10-15 06:10:31

不可以，preg_match_all结果集的计算不能被限制。之后您只能使用 array_slice 或 array_splice （这需要 PREG_SET_ORDER）：

preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
$firstMatches = array_slice($matches, 0, 20);

但除此之外，您不应该使用正则表达式无论如何都要解析 HTML。虽然现代的正则表达式引擎已经不再是正则的了，可以处理像HTML这样的不规则语言，但是它太容易出错了。最好使用适当的 HTML 解析器，例如 PHP 的 DOM 库之一。然后使用计数器最多只能获取 20 个匹配项：

$doc = new DOMDocument();
$doc->loadHTML($code);
$counter = 20;
$matches = array();
foreach ($doc->getElementsByTagName('p') as $elem) {
    if ($counter-- <= 0) {
        break;
    }
    $matches[] = $elem;
}

No, the computation of the preg_match_all result set cannot be limited. You can only limit the results afterwards with array_slice or array_splice (this would require PREG_SET_ORDER):

preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
$firstMatches = array_slice($matches, 0, 20);

But besides that, you shouldn’t use regular expressions to parse HTML anyway. Although modern regular expressions engines are not regular any more and can process an irregular language like HTML, it is too error prone. Better use an appropriate HTML parser instead like the one of PHP’s DOM library. Then just use a counter to only get up to 20 matches:

$doc = new DOMDocument();
$doc->loadHTML($code);
$counter = 20;
$matches = array();
foreach ($doc->getElementsByTagName('p') as $elem) {
    if ($counter-- <= 0) {
        break;
    }
    $matches[] = $elem;
}

回复收藏 0 原文

泪是无色的血 2024-10-15 06:10:31

您可以使用 T-Regx 库：

pattern('<p>')->match($yourHtml)->only(20);

You can use T-Regx library:

pattern('<p>')->match($yourHtml)->only(20);

回复收藏 0 原文

昇り龍 2024-10-15 06:10:31

为了扩展 @Gumbo 使用 DOM 解析器而不是正则表达式的伟大建议，以下代码片段将使用带有 position() 条件的 XPath 查询来限制目标标签。

代码：（演示定位 5 个 p 标签中的 4 个）

$html = <<<HTML
<div>
    <p class="classy">1
</p>
    <p>2</p>
    <p data-p="<p>notatag</p>">3</p>
    <span data-monkeywrench='<p'>z</span>
    <p
 data-p="<p>notatag</p>">4</p>
    <p>5</p>
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//p[position() <= 4]') as $p) {
    echo var_export($p->nodeValue, true) , "\n---\n";
}

输出：

'1
'
---
'2'
---
'3'
---
'4'
---

To extend on @Gumbo's great advice to use a DOM parser instead of regex, the following snippet will use a XPath query with a position() condition to limit the targeted tags.

Code: (Demo targeting 4 of 5 p tags)

$html = <<<HTML
<div>
    <p class="classy">1
</p>
    <p>2</p>
    <p data-p="<p>notatag</p>">3</p>
    <span data-monkeywrench='<p'>z</span>
    <p
 data-p="<p>notatag</p>">4</p>
    <p>5</p>
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//p[position() <= 4]') as $p) {
    echo var_export($p->nodeValue, true) , "\n---\n";
}

Output:

'1
'
---
'2'
---
'3'
---
'4'
---

回复收藏 0 原文

塔塔猫 2024-10-15 06:10:31

这才是真正的答案；最节省内存的方式。
请改用通过 preg_replace_callback() 进行引用分配。

<?php

$matches = [];

preg_replace_callback(
    '~<p(?:\s.*?)?>(?:.*?)</p>~s',
    function (array $match) use (&$matches) {
        $matches[] = $match[0];
    },
    $html,
    20,
    $_
);

var_dump($matches);

This is the true answer; the most memory-efficient way.
Use reference assignment via preg_replace_callback() instead.

<?php

$matches = [];

preg_replace_callback(
    '~<p(?:\s.*?)?>(?:.*?)</p>~s',
    function (array $match) use (&$matches) {
        $matches[] = $match[0];
    },
    $html,
    20,
    $_
);

var_dump($matches);

回复收藏 0 原文

梦里南柯 2024-10-15 06:10:31

您可以使用 preg_match_all() 并丢弃您不感兴趣的匹配项，也可以使用带有 preg_match() 的循环。如果您担心扫描大字符串的费用，第二个选项会更好。

此示例限制为 2 个匹配项，而整个字符串中实际上有 3 个匹配项：

<?php

$str = "ab1ab2ab3ab4c";

for ($offset = 0, $n = 0;
        $n < 2 && preg_match('/b([0-9])/', $str, $matches, PREG_OFFSET_CAPTURE, $offset);
        ++$n, $offset = $matches[0][1] + 1) {

        var_dump($matches);
}

实际上，while 循环可能比反射上的 for 循环更清晰；）

You can either use preg_match_all() and discard the matches you're not interested in, or you can use a loop with preg_match(). The second option would be better if you're concern about the expense of scanning a large string.

This example limits to 2 matches, when there are actually 3 in the entire string:

<?php

$str = "ab1ab2ab3ab4c";

for ($offset = 0, $n = 0;
        $n < 2 && preg_match('/b([0-9])/', $str, $matches, PREG_OFFSET_CAPTURE, $offset);
        ++$n, $offset = $matches[0][1] + 1) {

        var_dump($matches);
}

Really a while loop would probably have been clearer than a for loop on reflection ;)

回复收藏 0 原文

︶￣淡然 2024-10-15 06:10:31

我不这么认为，但是 preg_match 确实有一个 offset 参数，以及一个 PREG_OFFSET_CAPTURE 标志，组合后可用于获取“下一个匹配”。

如果您不想获取所有结果然后 array_slice() 删除一部分:o)

编辑：
好的，这是一些代码（未经测试或以任何方式使用）：

$offset = 0;
$matches = array();
for ($i = 0; $i < 20; $i++) {
    $results = preg_match('/<p(?:.*?)>/', $string, PREG_OFFSET_CAPTURE, $offset);
    if (empty($results)) {
        break;
    } else {
        $matches[] = $results[0][0];
        $offset += $results[0][1];
    }
}

I don't think so, but preg_match does have an offset parameter, and also a PREG_OFFSET_CAPTURE flag which, when combined, can be used to get the "next match".

This is mainly useful if you don't want to get all results and then array_slice() a portion off :o)

EDIT:
Ok, here's some code (not tested or used in any way):

$offset = 0;
$matches = array();
for ($i = 0; $i < 20; $i++) {
    $results = preg_match('/<p(?:.*?)>/', $string, PREG_OFFSET_CAPTURE, $offset);
    if (empty($results)) {
        break;
    } else {
        $matches[] = $results[0][0];
        $offset += $results[0][1];
    }
}

回复收藏 0 原文

~没有更多了~