DOMNodeList、xPath 和 PHP

发布于 2024-12-19 19:36:15 字数 1177 浏览 0 评论 0原文

我正在 PHP 中使用 DOM 和 XPath 解析 HTML 页面。

我必须从 HTML 中获取嵌套的

我在浏览器中使用 FirePath 定义了一个查询,该查询指向“

html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table

当我运行代码时,它说 DOMNodeList is fetched has length 0”。我的目标是输出查询的 。 作为字符串。这是 PHP 中的 HTML 抓取脚本。

下面是该函数。请帮助我如何提取所需的

$pageUrl = "http://www.boc.cn/sourcedb/whpj/enindex.html";

getExchangeRateTable($pageUrl);


function getExchangeRateTable($url){
    $htmlTable = "";
    $xPathTable = nulll;
    $xPathQuery1 = "html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table";

    if(strlen($url)==0){die('Argument exception: method call [getExchangeRateTable] expects a string of URL!');}

    // initialize objects
    $page = tidyit($url);
    $dom = new DOMDocument();
    $dom->loadHTML($page);
    $xpath = new DOMXPath($dom);

    // $elements is sppearing as DOMNodeList
    $elements = $xpath->query($xPathQuery1);

    // print_r($elements);
    foreach($elements as $e){
        $e->firstChild->nodeValue;  
    }

}

I am parsing an HTML page with DOM and XPath in PHP.

I have to fetch a nested <Table...></table> from the HTML.

I have defined a query using FirePath in the browser which is pointing to

html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table

When I run the code it says DOMNodeList is fetched having length 0. My objective is to spout out the queried <Table> as a string. This is an HTML scraping script in PHP.

Below is the function. Please help me how can I extract the required <table>

$pageUrl = "http://www.boc.cn/sourcedb/whpj/enindex.html";

getExchangeRateTable($pageUrl);


function getExchangeRateTable($url){
    $htmlTable = "";
    $xPathTable = nulll;
    $xPathQuery1 = "html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table";

    if(strlen($url)==0){die('Argument exception: method call [getExchangeRateTable] expects a string of URL!');}

    // initialize objects
    $page = tidyit($url);
    $dom = new DOMDocument();
    $dom->loadHTML($page);
    $xpath = new DOMXPath($dom);

    // $elements is sppearing as DOMNodeList
    $elements = $xpath->query($xPathQuery1);

    // print_r($elements);
    foreach($elements as $e){
        $e->firstChild->nodeValue;  
    }

}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

诺曦 2024-12-26 19:36:15

你有这样尝试过吗

$dom = new domDocument; 
$dom->loadHTML($tes); 
$dom->preserveWhiteSpace = false; 
$tables = $dom->getElementsByTagName("table");
$rows = $tables->item(0)->getElementsByTagName("tr"); 
print_r($rows);

have you try like this

$dom = new domDocument; 
$dom->loadHTML($tes); 
$dom->preserveWhiteSpace = false; 
$tables = $dom->getElementsByTagName("table");
$rows = $tables->item(0)->getElementsByTagName("tr"); 
print_r($rows);
花想c 2024-12-26 19:36:15

从您的 XPath 查询中删除 tbody - 在大多数情况下,它们是由您的浏览器插入的,就像您尝试抓取的页面一样。

/html/body/table[2]/tr/td[2]/table[2]/tr/td/table

这很可能会起作用。

然而,使用不同的 XPath 可能更安全。以下 XPath 将根据文本内容选择第一个,然后选择 tr 的父级 - tbody 或表:

//th[contains(text(),'Currency Name')]/parent::tr/parent::*

Remove the tbody's from your XPath query - they are in most cases inserted by your browser, as is with the page you are trying to scrape.

/html/body/table[2]/tr/td[2]/table[2]/tr/td/table

This will most likely work.

However, its probaly more safe to use a different XPath. Following XPath will select the first th based on it's textual content, then select the tr's parent - a tbody or table:

//th[contains(text(),'Currency Name')]/parent::tr/parent::*
剪不断理还乱 2024-12-26 19:36:15

xpath 查询应该以 / 开头,例如:-

/html/...

The xpath query should be with a leading / like :-

/html/...
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文