正则表达式换行符
我正在尝试使用如下正则表达式:
preg_match_all('|<table.*</table>|',$html,$matches, PREG_SET_ORDER);
但这不起作用,我认为问题是字符串 $html
内的新行。
有人可以告诉我解决方法吗?
EDIT: I've realized that it's not right to use regex to parse HTML. Thanks to those who told me. :)
I'm trying to use a regular expression as below:
preg_match_all('|<table.*</table>|',$html,$matches, PREG_SET_ORDER);
But this is not working, and I think the problem is the new line inside the string $html
.
Could someone tell me a work around?
EDIT: I've realized that it's not right to use regex to parse HTML. Thanks to those who told me. :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您是否尝试过多行修饰符
m
?Did you try the multiline modifier
m
?使用 /s 标志来获取“.”也适用于换行符,或者只是显式检查换行符 - 通常为“[\n\r]”。我自己还没有读过,但请在 http:// 查看有关 PCRE 库的更多信息。 www.pcre.org/pcre.txt
不过,要小心如何形成模式 - 带有换行符的长输入字符串与被误解的模式混合在一起可能会导致无法解释的脚本失败和连接重置。
就您而言,这里似乎不需要 PCRE 函数,并且无论如何都可能会导致意外结果。如果您只是想提取页面上单个表格的内容,为什么不只做最基本的...
Use the /s flag to have the '.' also apply to new line characters, or just check for new line characters explicitly - usually '[\n\r]'. I haven't yet read it myself, but do check out more info on the PCRE library at http://www.pcre.org/pcre.txt
Careful how you form your pattern though - long input strings with newlines mixed with misunderstood patterns can cause unexplained script failures and connection resets.
In your case, PCRE functions don't seem to be needed here, and could cause unexpected results anyway. If you're just looking to extract contents of a single table on a page, why not just do the most basic...
更好:您可以将
$html
读入 SimpleXML 对象并使用 SimpleXML 的 Xpath。 (恕我直言,比 DOM 扩展更强大并且更容易使用。)像这样:
Better: You can read
$html
into a SimpleXML object and parse it with SimpleXML's Xpath. (Powerful and much easier to use than the DOM extension IMHO.)Like this:
点不匹配换行符,除非 s模式修饰符。
(请注意,使用正则表达式解析 HTML 是 SO 中最严重的罪过之一)。
The dot does not match newlines unless the s pattern modifier is used.
(Be aware that using regex to parse HTML ranks among the worst capital sins here in SO).
在决定下一步做什么之前,我首先阅读以下内容: http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html
一般来说,用解析 HTMl 并不是一个好主意正则表达式。
我建议使用 DOM
您可以查看 PHP 简单 HTML DOM 解析器 作为替代方案。
主要特点:
Before making a decision on what to do next, I'd read this first: http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html
In general, it's not a good idea to parse HTMl with RegEx.
I recommend using DOM
You can check out the PHP Simple HTML DOM Parser as an alternative.
Main Features: