DOMNodeList、xPath 和 PHP
我正在 PHP 中使用 DOM 和 XPath 解析 HTML 页面。
我必须从 HTML 中获取嵌套的
。
我在浏览器中使用 FirePath 定义了一个查询,该查询指向“
html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table
当我运行代码时,它说 DOMNodeList
is fetched has length 0”。我的目标是输出查询的 。 作为字符串。这是 PHP 中的 HTML 抓取脚本。
下面是该函数。请帮助我如何提取所需的
$pageUrl = "http://www.boc.cn/sourcedb/whpj/enindex.html";
getExchangeRateTable($pageUrl);
function getExchangeRateTable($url){
$htmlTable = "";
$xPathTable = nulll;
$xPathQuery1 = "html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table";
if(strlen($url)==0){die('Argument exception: method call [getExchangeRateTable] expects a string of URL!');}
// initialize objects
$page = tidyit($url);
$dom = new DOMDocument();
$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
// $elements is sppearing as DOMNodeList
$elements = $xpath->query($xPathQuery1);
// print_r($elements);
foreach($elements as $e){
$e->firstChild->nodeValue;
}
}
I am parsing an HTML page with DOM and XPath in PHP.
I have to fetch a nested <Table...></table>
from the HTML.
I have defined a query using FirePath in the browser which is pointing to
html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table
When I run the code it says DOMNodeList
is fetched having length 0. My objective is to spout out the queried <Table>
as a string. This is an HTML scraping script in PHP.
Below is the function. Please help me how can I extract the required <table>
$pageUrl = "http://www.boc.cn/sourcedb/whpj/enindex.html";
getExchangeRateTable($pageUrl);
function getExchangeRateTable($url){
$htmlTable = "";
$xPathTable = nulll;
$xPathQuery1 = "html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table";
if(strlen($url)==0){die('Argument exception: method call [getExchangeRateTable] expects a string of URL!');}
// initialize objects
$page = tidyit($url);
$dom = new DOMDocument();
$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
// $elements is sppearing as DOMNodeList
$elements = $xpath->query($xPathQuery1);
// print_r($elements);
foreach($elements as $e){
$e->firstChild->nodeValue;
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
你有这样尝试过吗
have you try like this
从您的 XPath 查询中删除 tbody - 在大多数情况下,它们是由您的浏览器插入的,就像您尝试抓取的页面一样。
这很可能会起作用。
然而,使用不同的 XPath 可能更安全。以下 XPath 将根据文本内容选择第一个,然后选择 tr 的父级 - tbody 或表:
Remove the tbody's from your XPath query - they are in most cases inserted by your browser, as is with the page you are trying to scrape.
This will most likely work.
However, its probaly more safe to use a different XPath. Following XPath will select the first th based on it's textual content, then select the tr's parent - a tbody or table:
xpath 查询应该以
/
开头,例如:-The xpath query should be with a leading
/
like :-