PHP preg_match_all() 不捕获子组

发布于 2024-10-02 05:19:32 字数 664 浏览 1 评论 0原文

我正在尝试用 PHP 解析 Twitter Atom feed，但遇到了这个奇怪的问题。我正在调用 preg_match_all此正则表达式字符串：

"|<entry>.*<title>(.*)</title>.*<published>(.*)</published>.*</entry>|xsU"

它匹配所有条目，但捕获的子组标题/已发布不会显示在结果中（结果对象中未创建捕获子组的数组）。

现在到了奇怪的部分，我也尝试捕获最后一点：

"|<entry>.*<title>(.*)</title>.*<published>(.*)</published>(.*)</entry>|xsU"

现在捕获工作了。我得到了标题、发布日期以及大量我不想要的最终数据。

我尝试将非捕获字符串“？：”添加到最后一个子组，但随后捕获再次停止工作。

那么如何捕获我想要的数据，而不必在最后捕获大量不需要的数据呢？

原文

I'm trying to parse a Twitter atom feed in PHP but am running into this strange issue. I'm calling preg_match_all with this regexp string:

"|<entry>.*<title>(.*)</title>.*<published>(.*)</published>.*</entry>|xsU"

It matches all the entries OK, but the captured subgroups title/published do not show up in the results (no arrays for the captured subgroups are created in the result object).

Now to the strange part, I try to capture the last bit as well:

"|<entry>.*<title>(.*)</title>.*<published>(.*)</published>(.*)</entry>|xsU"

And now the capturing works. I get the title and the published date and the large chunk of final data that I don't want.

I tried to add the non capturing string "?:" to the last subgroup but then capturing stopped working alltogether again.

So how do I capture the data I want, without having to capture the large chunk of unwanted data at the end?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

木落 2024-10-09 05:19:32

我建议您使用 DOM （或 SimpleXML) 用于解析 RSS/Atom 提要。与使用正则表达式相比，您将获得更好的结果。

这是一个示例（使用 SimpleXML）：

$rss_feed = file_get_contents('http://stackoverflow.com/feeds/question/4187945');
$sxml = new SimpleXMLElement($rss_feed);

$title = $sxml->entry[0]->title;
echo $title;

I recommend you use DOM (or SimpleXML) for parsing RSS/Atom feeds. You will get way better results than with regular expressions.

Here's an example (using SimpleXML):

$rss_feed = file_get_contents('http://stackoverflow.com/feeds/question/4187945');
$sxml = new SimpleXMLElement($rss_feed);

$title = $sxml->entry[0]->title;
echo $title;

回复收藏 0 原文

~没有更多了~