正则表达式换行符

发布于 2024-09-02 21:40:01 字数 306 浏览 4 评论 0原文

我正在尝试使用如下正则表达式：

preg_match_all('|<table.*</table>|',$html,$matches, PREG_SET_ORDER);

但这不起作用，我认为问题是字符串 $html 内的新行。
有人可以告诉我解决方法吗？

EDIT: I've realized that it's not right to use regex to parse HTML. Thanks to those who told me. :)

原文

I'm trying to use a regular expression as below:

preg_match_all('|<table.*</table>|',$html,$matches, PREG_SET_ORDER);

But this is not working, and I think the problem is the new line inside the string $html.
Could someone tell me a work around?

EDIT: I've realized that it's not right to use regex to parse HTML. Thanks to those who told me. :)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

趁微风不噪 2024-09-09 21:40:02

preg_match_all('|<table.*?</table>|ms',$html,$matches, PREG_SET_ORDER);

preg_match_all('|<table.*?</table>|ms',$html,$matches, PREG_SET_ORDER);

回复收藏 0 原文

动次打次papapa 2024-09-09 21:40:02

您是否尝试过多行修饰符m？

preg_match_all('|<table.*</table>|m',$html,$matches, PREG_SET_ORDER);

Did you try the multiline modifier m?

preg_match_all('|<table.*</table>|m',$html,$matches, PREG_SET_ORDER);

回复收藏 0 原文

小瓶盖 2024-09-09 21:40:02

使用 /s 标志来获取“.”也适用于换行符，或者只是显式检查换行符 - 通常为“[\n\r]”。我自己还没有读过，但请在 http:// 查看有关 PCRE 库的更多信息。 www.pcre.org/pcre.txt

不过，要小心如何形成模式 - 带有换行符的长输入字符串与被误解的模式混合在一起可能会导致无法解释的脚本失败和连接重置。

就您而言，这里似乎不需要 PCRE 函数，并且无论如何都可能会导致意外结果。如果您只是想提取页面上单个表格的内容，为什么不只做最基本的...

$start = stripos($input, "<table>");
$end = stripos($input, "</table>", $start);
$my_table = substr($input, $start, $end);

Use the /s flag to have the '.' also apply to new line characters, or just check for new line characters explicitly - usually '[\n\r]'. I haven't yet read it myself, but do check out more info on the PCRE library at http://www.pcre.org/pcre.txt

Careful how you form your pattern though - long input strings with newlines mixed with misunderstood patterns can cause unexplained script failures and connection resets.

In your case, PCRE functions don't seem to be needed here, and could cause unexpected results anyway. If you're just looking to extract contents of a single table on a page, why not just do the most basic...

$start = stripos($input, "<table>");
$end = stripos($input, "</table>", $start);
$my_table = substr($input, $start, $end);

回复收藏 0 原文

静谧 2024-09-09 21:40:02

编辑：我意识到使用正则表达式来解析 HTML 是不对的。

更好：您可以将 $html 读入 SimpleXML 对象并使用 SimpleXML 的 Xpath。（恕我直言，比 DOM 扩展更强大并且更容易使用。）

像这样：

$html = "<html><body><table id=\"mytbl\"><tr><td>ABC</td></tr><tr><td>DEF</td></tr></table></body></html>";

$xml = simplexml_load_string($html);

if($xml)
foreach($xml->xpath("/html/body/*") as $item) {
    echo $item["id"] . "<br>"; // mytbl
    foreach($item->tr as $tr) {
        echo $tr->td . "<br>"; // 1:ABC, 2:DEF
    }
}

EDIT: I've realized that it's not right to use regex to parse HTML.

Better: You can read $html into a SimpleXML object and parse it with SimpleXML's Xpath. (Powerful and much easier to use than the DOM extension IMHO.)

Like this:

$html = "<html><body><table id=\"mytbl\"><tr><td>ABC</td></tr><tr><td>DEF</td></tr></table></body></html>";

$xml = simplexml_load_string($html);

if($xml)
foreach($xml->xpath("/html/body/*") as $item) {
    echo $item["id"] . "<br>"; // mytbl
    foreach($item->tr as $tr) {
        echo $tr->td . "<br>"; // 1:ABC, 2:DEF
    }
}

回复收藏 0 原文

〗斷ホ乔殘χμё〖 2024-09-09 21:40:01

点不匹配换行符，除非 s模式修饰符。

preg_match_all('|<table.*?</table>|s',$html,$matches, PREG_SET_ORDER);

（请注意，使用正则表达式解析 HTML 是 SO 中最严重的罪过之一）。

The dot does not match newlines unless the s pattern modifier is used.

preg_match_all('|<table.*?</table>|s',$html,$matches, PREG_SET_ORDER);

(Be aware that using regex to parse HTML ranks among the worst capital sins here in SO).

回复收藏 0 原文

一个人的旅程 2024-09-09 21:40:01

在决定下一步做什么之前，我首先阅读以下内容： http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html

一般来说，用解析 HTMl 并不是一个好主意正则表达式。

我建议使用 DOM

您可以查看 PHP 简单 HTML DOM 解析器作为替代方案。

主要特点：

用 PHP5+ 编写的 HTML DOM 解析器可让您以非常简单的方式操作 HTML！
需要 PHP 5+。
支持无效 HTML。
使用选择器在 HTML 页面上查找标签，就像 jQuery 一样。
在一行中从 HTML 中提取内容。

回复收藏 0 原文

~没有更多了~

关于作者

千紇

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

正则表达式换行符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

正则表达式换行符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。