PHP preg_match_all() 不捕获子组
我正在尝试用 PHP 解析 Twitter Atom feed,但遇到了这个奇怪的问题。我正在调用 preg_match_all
此正则表达式字符串:
"|<entry>.*<title>(.*)</title>.*<published>(.*)</published>.*</entry>|xsU"
它匹配所有条目,但捕获的子组标题/已发布不会显示在结果中(结果对象中未创建捕获子组的数组)。
现在到了奇怪的部分,我也尝试捕获最后一点:
"|<entry>.*<title>(.*)</title>.*<published>(.*)</published>(.*)</entry>|xsU"
现在捕获工作了。我得到了标题、发布日期以及大量我不想要的最终数据。
我尝试将非捕获字符串“?:”添加到最后一个子组,但随后捕获再次停止工作。
那么如何捕获我想要的数据,而不必在最后捕获大量不需要的数据呢?
I'm trying to parse a Twitter atom feed in PHP but am running into this strange issue. I'm calling preg_match_all
with this regexp string:
"|<entry>.*<title>(.*)</title>.*<published>(.*)</published>.*</entry>|xsU"
It matches all the entries OK, but the captured subgroups title/published do not show up in the results (no arrays for the captured subgroups are created in the result object).
Now to the strange part, I try to capture the last bit as well:
"|<entry>.*<title>(.*)</title>.*<published>(.*)</published>(.*)</entry>|xsU"
And now the capturing works. I get the title and the published date and the large chunk of final data that I don't want.
I tried to add the non capturing string "?:" to the last subgroup but then capturing stopped working alltogether again.
So how do I capture the data I want, without having to capture the large chunk of unwanted data at the end?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我建议您使用 DOM (或 SimpleXML) 用于解析 RSS/Atom 提要。与使用正则表达式相比,您将获得更好的结果。
这是一个示例(使用 SimpleXML):
I recommend you use DOM (or SimpleXML) for parsing RSS/Atom feeds. You will get way better results than with regular expressions.
Here's an example (using SimpleXML):