为什么这个 Xpath 查询不能在 facebook 应用程序页面的 DOM 上运行？

发布于 2024-10-19 23:56:49 字数 1201 浏览 1 评论 0原文

我不明白为什么我的 xpath 查询返回第二个 url 的正确 href 而不是第一个 url。 HTML 代码看起来是一样的。它包含相同类型的结构。但不知何故没有返回 href。（我只是注释掉每个 $url 来测试它）

$url = "http://apps.facebook.com/TexasHoldEmPoker/"; // this one does not work
//$url = "http://nu.nl"; // this one works

$response = wp_remote_get($url);
$data = $response['body'];
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->strictErrorChecking = false;
$href='';
if (!$dom->loadHTML($data))
{
    foreach (libxml_get_errors() as $error)
    {
    }
    libxml_clear_errors();
}
else
{
    $xpath = new DOMXPath($dom);
    $elements = $xpath->query("/html/head/link[@rel='shortcut icon']");

    if (!is_null($elements))
    {
        foreach ($elements as $element)
        {
            if ($element->getAttribute('href'))
            {
                $href = $element->getAttribute('href');
            }
        }
    }
}
echo $href;

所以我知道代码对于“nu.nl”来说是正确的，但不知何故不适用于 facebook 应用程序页面。我无法理解为什么，因为结构是相同的。

ps：完整代码在这里： http://plugins.svn .wordpress.org/wp-favicons/trunk/plugins/sources/page.php

原文

I dont understand why my xpath query returns the correct href for the second url but not the first url. The HTML code looks the same. It contains the same kind of structure. But somehow no href is returned. (I just comment out each one of the $url's to test it)

$url = "http://apps.facebook.com/TexasHoldEmPoker/"; // this one does not work
//$url = "http://nu.nl"; // this one works

$response = wp_remote_get($url);
$data = $response['body'];
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->strictErrorChecking = false;
$href='';
if (!$dom->loadHTML($data))
{
    foreach (libxml_get_errors() as $error)
    {
    }
    libxml_clear_errors();
}
else
{
    $xpath = new DOMXPath($dom);
    $elements = $xpath->query("/html/head/link[@rel='shortcut icon']");

    if (!is_null($elements))
    {
        foreach ($elements as $element)
        {
            if ($element->getAttribute('href'))
            {
                $href = $element->getAttribute('href');
            }
        }
    }
}
echo $href;

So I know the code is working correct for "nu.nl" but somehow not for the facebook apps pages. I cant grasp why since the structure is the same.

p.s. : full code here: http://plugins.svn.wordpress.org/wp-favicons/trunk/plugins/sources/page.php

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

难得心□动 2024-10-26 23:56:50

看一下 $dom->saveXML() 。

您将看到元素是 body 的子元素，而不是像预期的那样是 head 的子元素。

所以 xpath 应该是：

/html/body/link[@rel='shortcut icon']

或者

//link[@rel='shortcut icon']

我猜不同的标记是解析器在尝试修复内的非法时的结果（一切）在之后并包括该已移至后的头部内）

Take a look at $dom->saveXML() .

You'll see that the <link>-element is a child of body, not of head like expected.

So the xpath should be:

/html/body/link[@rel='shortcut icon']

//link[@rel='shortcut icon']

I guess the different markup is a result of the parser when trying to fix the illegal <noscript> inside the <head>(everything inside the head after and including this <noscript> has been moved to the <body>)

回复收藏 0 原文

~没有更多了~