SimpleXML->xpath 问题

发布于 2024-11-05 14:22:35 字数 2184 浏览 0 评论 0 原文

我正在尝试访问每个表行:

http://www.alliedelec.com/search/searchresults.aspx?N=0&Ntt=PIC16F648&Ntk=Primary&i=0&sw=n

与 SimpleXML-> xpath。我已将表的 xpath 确定为:

'//*[@id="tblParts"]'

现在,我获取 cURL 字符串 $string 并执行以下操作:

$tidy->parseString($string);
$output = (string) $tidy;
$xml = new SimpleXMLElement($output);
$result = $xml->xpath('//*[@id="tblParts"]');
while(list( , $node) = each($result)) 
{
echo 'NODE:' . $node . "\n";
}

我得到的结果是诸如此类的错误,数以百计:

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: Entity: line 60: parser error : Opening and ending tag mismatch: meta line 22 and head in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: </head> in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: ^ in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: Entity: line 108: parser error : Opening and ending tag mismatch: img line 106 and td in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

以及最后的结果:

Fatal error: Uncaught exception 'Exception' with message 'String could not be parsed as XML' in C:\xampp\htdocs\app\com\get\get_alliedelectronics.php:119 Stack trace: #0 C:\xampp\htdocs\app\com\get\get_alliedelectronics.php(119): SimpleXMLElement->__construct('<!DOCTYPE html ...') #1 C:\xampp\htdocs\app\com\get\get_alliedelectronics.php(95): get_Alliedelectronics->extractData('<!DOCTYPE html ...') #2 C:\xampp\htdocs\app\com\get\get_alliedelectronics.php(138): get_Alliedelectronics->query('PIC16F648') #3 {main} thrown in C:\xampp\htdocs\app\com\get\get_alliedelectronics.php on line 119

I am trying to access each table row of:

http://www.alliedelec.com/search/searchresults.aspx?N=0&Ntt=PIC16F648&Ntk=Primary&i=0&sw=n

with SimpleXML->xpath. I have identified the xpath of the table to be:

'//*[@id="tblParts"]'

Now I take my cURL string $string and do the following:

$tidy->parseString($string);
$output = (string) $tidy;
$xml = new SimpleXMLElement($output);
$result = $xml->xpath('//*[@id="tblParts"]');
while(list( , $node) = each($result)) 
{
echo 'NODE:' . $node . "\n";
}

What I get back are errors such as these, by the hundreds:

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: Entity: line 60: parser error : Opening and ending tag mismatch: meta line 22 and head in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: </head> in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: ^ in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: Entity: line 108: parser error : Opening and ending tag mismatch: img line 106 and td in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

As well as this at the end:

Fatal error: Uncaught exception 'Exception' with message 'String could not be parsed as XML' in C:\xampp\htdocs\app\com\get\get_alliedelectronics.php:119 Stack trace: #0 C:\xampp\htdocs\app\com\get\get_alliedelectronics.php(119): SimpleXMLElement->__construct('<!DOCTYPE html ...') #1 C:\xampp\htdocs\app\com\get\get_alliedelectronics.php(95): get_Alliedelectronics->extractData('<!DOCTYPE html ...') #2 C:\xampp\htdocs\app\com\get\get_alliedelectronics.php(138): get_Alliedelectronics->query('PIC16F648') #3 {main} thrown in C:\xampp\htdocs\app\com\get\get_alliedelectronics.php on line 119

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梨涡 2024-11-12 14:22:35

看起来您正在获取并尝试解析的页面的 HTML 格式不正确(标签不匹配等)。

您可以尝试使用 simplexml_import_dom 正如我在这篇文章

Looks like the HTML of the page you're fetching and trying to parse isn't well formed (tag mismatches etc.)

You can try and fix the errors using simplexml_import_dom as I explain in this SO post.

酒儿 2024-11-12 14:22:35

我建议不要使用 SimpleXML(@Nev Stokes 和 @Nicholas Wilson 是对的:这是 html,而不是 XML,并且您不能保证它将作为 XML 进行验证)并使用 DOM 之类的东西(请参阅 http://www.php.net/manual/en/book.dom.php)。您可以执行以下操作:

$doc = new DOMDocument();
$doc->loadHTML($string);
$xpath = new DOMXPath($doc);
$entries = $xpath->query('//*[@id="tblParts"]');
foreach ($entries as $entry) {
  // do something
}

看看是否有帮助。

I'd suggest not using SimpleXML (@Nev Stokes and @Nicholas Wilson are right: this is html, not XML and you have no guarantees that it will validate as XML) and use something like DOM (see http://www.php.net/manual/en/book.dom.php). You can do something like:

$doc = new DOMDocument();
$doc->loadHTML($string);
$xpath = new DOMXPath($doc);
$entries = $xpath->query('//*[@id="tblParts"]');
foreach ($entries as $entry) {
  // do something
}

See if that helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文