格式错误的 HTML 和 XPath 查询

发布于 2024-12-02 13:56:52 字数 1097 浏览 2 评论 0原文

我有一个格式错误的 HTML,无法更改。运行 XPath 查询根本不返回节点:

$el = $xpath->query("//a[@class='product']/table"); // can get a tag with "//a[@class='product']"
print_r($el->length); // 0

Malformed HTML:

<a class="product" href="#">
    <table width="385" cellspacing="0" cellpadding="5" style="border:1px; border-bottom-color:#E2E2E2; border-bottom-style:solid;">
        <tr>
            <td width="55">
                <img src="http://foobar.com:8080/img/1212.jpg" height="50" width="50">
            </td>
        <td width="195">Cod.27731<br>Product Name</td>
            <td width="60" align="center"><a href="?pageContent=items&price=fab&prodcod=27731">Details</a></td>
            <td width="80" nowrap>
                <div style="color:#FF0000;"><strong>$ 35.23</strong></div>
        </td>
        </tr>
    </table>
</a>

I can get the a element but I can't get it child (the table)...

I have a malformed HTML that I can't change. Running a XPath Query doesn't return the nodes at all:

$el = $xpath->query("//a[@class='product']/table"); // can get a tag with "//a[@class='product']"
print_r($el->length); // 0

Malformed HTML:

<a class="product" href="#">
    <table width="385" cellspacing="0" cellpadding="5" style="border:1px; border-bottom-color:#E2E2E2; border-bottom-style:solid;">
        <tr>
            <td width="55">
                <img src="http://foobar.com:8080/img/1212.jpg" height="50" width="50">
            </td>
        <td width="195">Cod.27731<br>Product Name</td>
            <td width="60" align="center"><a href="?pageContent=items&price=fab&prodcod=27731">Details</a></td>
            <td width="80" nowrap>
                <div style="color:#FF0000;"><strong>$ 35.23</strong></div>
        </td>
        </tr>
    </table>
</a>

I can get the a element but I can't get its child (the table)...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

风筝在阴天搁浅。 2024-12-09 13:56:52

由于 libxml 会更改 HTML 以关闭表格之前的 a 元素,因此您必须查询 以下内容-sibling 表,例如

$dom = new DOMDocument;
$dom->loadHtml($html);
$xpath = new DOMXpath($dom);
$el = $xpath->query("//a[@class='product']/following-sibling::table");
echo $dom->saveHtml($el->item(0));

或从 a 元素遍历

$dom = new DOMDocument;
$dom->loadHtml($html);
$xpath = new DOMXpath($dom);
$table = $xpath->query("//a[@class='product']")->item(0)->nextSibling;
echo $dom->saveHtml($table);

请注意,将参数传递给 saveHTML 至少需要 PHP 5.3.6

Since libxml will change the HTML to close the a element before the table, you have to query for the following-sibling table instead, e.g.

$dom = new DOMDocument;
$dom->loadHtml($html);
$xpath = new DOMXpath($dom);
$el = $xpath->query("//a[@class='product']/following-sibling::table");
echo $dom->saveHtml($el->item(0));

or traversing from the a element

$dom = new DOMDocument;
$dom->loadHtml($html);
$xpath = new DOMXpath($dom);
$table = $xpath->query("//a[@class='product']")->item(0)->nextSibling;
echo $dom->saveHtml($table);

Note that passing an argument to saveHTML requires at least PHP 5.3.6

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文