使用 preg_match_all PHP 限制结果数量
有没有办法限制使用 preg_match_all
返回的匹配项数量?
例如,我只想匹配网页上的前 20 个
标记,但有 100 个
标记。
干杯
Is there any way to limit the number of matches that will be returned using preg_match_all
?
So for example, I want to match only the first 20 <p>
tags on a web page but there are 100 <p>
tags.
Cheers
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
只需匹配所有并对结果数组进行切片即可:
Just match all and slice the resulting array:
不可以,
preg_match_all
结果集的计算不能被限制。之后您只能使用array_slice
或array_splice
(这需要 PREG_SET_ORDER):但除此之外,您不应该使用正则表达式无论如何都要解析 HTML。虽然现代的正则表达式引擎已经不再是正则的了,可以处理像HTML这样的不规则语言,但是它太容易出错了。最好使用适当的 HTML 解析器,例如 PHP 的 DOM 库 之一。然后使用计数器最多只能获取 20 个匹配项:
No, the computation of the
preg_match_all
result set cannot be limited. You can only limit the results afterwards witharray_slice
orarray_splice
(this would require PREG_SET_ORDER):But besides that, you shouldn’t use regular expressions to parse HTML anyway. Although modern regular expressions engines are not regular any more and can process an irregular language like HTML, it is too error prone. Better use an appropriate HTML parser instead like the one of PHP’s DOM library. Then just use a counter to only get up to 20 matches:
您可以使用 T-Regx 库:
You can use T-Regx library:
为了扩展 @Gumbo 使用 DOM 解析器而不是正则表达式的伟大建议,以下代码片段将使用带有
position()
条件的 XPath 查询来限制目标标签。代码:(演示定位 5 个 p 标签中的 4 个)
输出:
To extend on @Gumbo's great advice to use a DOM parser instead of regex, the following snippet will use a XPath query with a
position()
condition to limit the targeted tags.Code: (Demo targeting 4 of 5 p tags)
Output:
这才是真正的答案;最节省内存的方式。
请改用通过
preg_replace_callback()
进行引用分配。This is the true answer; the most memory-efficient way.
Use reference assignment via
preg_replace_callback()
instead.您可以使用
preg_match_all()
并丢弃您不感兴趣的匹配项,也可以使用带有preg_match()
的循环。如果您担心扫描大字符串的费用,第二个选项会更好。此示例限制为 2 个匹配项,而整个字符串中实际上有 3 个匹配项:
实际上,
while
循环可能比反射上的for
循环更清晰;)You can either use
preg_match_all()
and discard the matches you're not interested in, or you can use a loop withpreg_match()
. The second option would be better if you're concern about the expense of scanning a large string.This example limits to 2 matches, when there are actually 3 in the entire string:
Really a
while
loop would probably have been clearer than afor
loop on reflection ;)我不这么认为,但是 preg_match 确实有一个
offset
参数,以及一个PREG_OFFSET_CAPTURE
标志,组合后可用于获取“下一个匹配”。如果您不想获取所有结果然后
array_slice()
删除一部分:o)编辑:
好的,这是一些代码(未经测试或以任何方式使用):
I don't think so, but preg_match does have an
offset
parameter, and also aPREG_OFFSET_CAPTURE
flag which, when combined, can be used to get the "next match".This is mainly useful if you don't want to get all results and then
array_slice()
a portion off :o)EDIT:
Ok, here's some code (not tested or used in any way):