无法读取 DOMDocument 中的元重定向 URL
我正在尝试读取网站的元重定向。数据位于curl请求中(我已经构建了一个存根来测试)。
不起作用的是“读取 URL”——任何 PHP DOMDocument 专家都能够告诉我为什么这段代码不起作用?我正在尝试从元刷新标记中获取 URL。
$r['body'] = '<HTML><HEAD><TITLE>Meta Refresh Example</TITLE>'.
'<meta http-equiv=refresh content="12; URL=meta2.htm">'.
'<link rel="stylesheet" href="../bwsrstyle.css" type="text/css">'.
'<LINK REL="SHORTCUT ICON" href="/myicon.ico">'.
'<meta http-equiv="Content-Type" content="text/html; charset=></HEAD>'.
'<BODY BGCOLOR="#FFFFFF" TEXT="#000000">foo</BODY></HTML>';
$dom = new DOMDocument();
@$dom->loadHTML($r['body']);
$xpath = new DOMXpath($dom);
$meta_redirect = $xpath->query("//meta[@http-equiv='refresh']");
foreach ($meta_redirect as $node) {
echo "\nNODE: {$node->getAttribute('http-equiv')} ".
"\nURL: {$node->getAttribute('url')}\n";
}
“刷新”正确拉取,但 URL 不正确。
I'm trying to read the meta redirect of a website. The data is in a curl request (I've built a stub to test with).
What's not working is the "read a URL" thing - any PHP DOMDocument experts out there able to tell me why this code isn't working? I'm trying to get the URL out of the meta refresh tag.
$r['body'] = '<HTML><HEAD><TITLE>Meta Refresh Example</TITLE>'.
'<meta http-equiv=refresh content="12; URL=meta2.htm">'.
'<link rel="stylesheet" href="../bwsrstyle.css" type="text/css">'.
'<LINK REL="SHORTCUT ICON" href="/myicon.ico">'.
'<meta http-equiv="Content-Type" content="text/html; charset=></HEAD>'.
'<BODY BGCOLOR="#FFFFFF" TEXT="#000000">foo</BODY></HTML>';
$dom = new DOMDocument();
@$dom->loadHTML($r['body']);
$xpath = new DOMXpath($dom);
$meta_redirect = $xpath->query("//meta[@http-equiv='refresh']");
foreach ($meta_redirect as $node) {
echo "\nNODE: {$node->getAttribute('http-equiv')} ".
"\nURL: {$node->getAttribute('url')}\n";
}
The 'refresh' is pulling correctly but the URL is not.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
没有属性
url=
。您需要查询content=
属性。您还必须手动拆分此结果字符串。它包含
2; url=
前缀仍然。这不是 DOM 函数通常处理的事情。There is no attribute
url=
. You need to query for thecontent=
attribute.And you will also have to manually split this result string up. It contains the
2; url=
prefix still. This is not something the DOM functions deal with normally.您根本没有格式良好的 XML 文档,但假设它格式良好,则
使用:
You do not have a wellformed XML document at all, but supposing it were wellformed then
Use: