解析 HTML 元素

发布于 2024-10-31 15:58:41 字数 367 浏览 0 评论 0原文

我之前用 DOM 来解析 PHP 中的网站。

我知道我永远不应该尝试使用正则表达式解析 HTML。

但是...（我不想引发一场狗屎风暴，只是一个答案：P）

如果我只想解析 1 个 HTML 元素，例如

<a href="http://example.com/something?id=1212132131133&filter=true" rel="blebeleble" target="_blank">

并找到 href 属性的内容，我可以吗（如果可以的话，我可能需要）使用 DOM 来解析这个字符串，或者我是否需要一个完整的网页才能使用 DOM 解析它？

原文

I've used DOM before to parse websites in PHP.

I know I should never try to parse HTML using regex.

But... (I don't want to start a shitstorm, just an answer :P )

If i want to parse just 1 HTML element, e.g.

<a href="http://example.com/something?id=1212132131133&filter=true" rel="blebeleble" target="_blank">

And find the content of the href attribute, can I (and probably I need to if I can) use DOM to parse this string or do I need a complete webpage to be able to parse it using the DOM?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一杯敬自由 2024-11-07 15:58:41

是的，你可以这样做。

您必须：

假装标签构成整个文档；
确保关闭标签；
确保输入字符串是有效的 XML（请注意，我已将 & 替换为 &（正确的 HTML 实体））。

代码：

<?php
$str = '<a href="http://example.com/something?id=1212132131133&filter=true" rel="blebeleble" target="_blank" />';

$dom = new DOMDocument();
$dom->loadXML($str);
var_dump($dom->childNodes->item(0)->attributes->getNamedItem('href')->value);

// Output: string(57) "http://example.com/something?id=1212132131133&filter=true"
?>

PS，如果你想包含链接文本，也可以：

$str = '<a href="http://example.com/something?id=1212132131133&filter=true" rel="blebeleble" target="_blank">Click here!</a>';
// .. code .. //

// Output: string(57) "http://example.com/something?id=1212132131133&filter=true"

Yes, you can do this.

You have to:

pretend that the <a /> tag constitutes the whole document;
ensure that you close the tag;
ensure that the input string is valid XML (note that I've replaced your & with &, the proper HTML entity).

Code:

<?php
$str = '<a href="http://example.com/something?id=1212132131133&filter=true" rel="blebeleble" target="_blank" />';

$dom = new DOMDocument();
$dom->loadXML($str);
var_dump($dom->childNodes->item(0)->attributes->getNamedItem('href')->value);

// Output: string(57) "http://example.com/something?id=1212132131133&filter=true"
?>

PS, if you want to include the link text, that's ok too:

$str = '<a href="http://example.com/something?id=1212132131133&filter=true" rel="blebeleble" target="_blank">Click here!</a>';
// .. code .. //

// Output: string(57) "http://example.com/something?id=1212132131133&filter=true"

回复收藏 0 原文