PHP - SimpleXML 解析错误
请参阅底部的编辑以显示更准确的错误输出
我第一次使用 SimpleXML 用 PHP 解析较大的 (~15MB) XML 文件。这些文件是航班搜索结果,因此它们具有很长的属性(链接回 Kayak;示例:
“/book/flightcode=1238917408.NxJI6G.0.F.ORBITZAIR,ORBITZAIR.0.f36f1ea92513977249aa695112410052&sid=26-Vu01v7ilzhSAjPVLZ3Ul”
SimpleXML 在解析时抛出此错误:
“Entity: line 1 0:解析器错误:EntityRef: 期待 ';'在”然后;
“38917408.NxJI6G.0.F.ORBITZAIR,ORBITZAIR.0.f36f1ea92513977249aa695112410052&sid” 进而;
对于包含这些 url 的每一行,“simplexml_load_string() [function.simplexml-load-string]: ^ in,”
等等。
我在 php.net 上发现了 SimpleXML 不喜欢长属性的提及,但没有解决方案。我宁愿现在只使用和学习 SimpleXML,如果有一个不卡顿、稍微简单的解决方法,就可以解决这个错误。
有人有解决办法吗?提前致谢!
我尝试输入 XML 的前 13 行,但它只输出没有 XML 的信息,所以......如果有帮助的话,我可以这样做。我不确定使用其他解析器/扩展是否会降低功能或易用性,但如果没有解决方法,请随时建议另一个解析器/扩展(我可能正在考虑 DOM 或 XMLReader)。
以下编辑包括更少的掺杂错误输出:
http://dl.dropbox.com/u/ 10206237/stack_overflow_xml.xml
错误 1:
simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: Entity: line 10: parser error : EntityRef: expecting ';' in
错误 2:(我认为 XML 很好,因为它可以与使用 DOM 的 Python 脚本一起使用;我将其转换为 PHP,因为我不懂 Python)。我不知道浏览器中的输出会有所不同。感谢您的耐心等待。)
<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: 38917408.Pt8rW8.0.F.ORBITZAIR,ORBITZAIR.0.f36f1ea92513977249aa695112410052&_sid_ in
错误 3:(
function.simplexml-load-string</a>]: ^ in
所有这些空格都在那里)
SEE EDITS AT BOTTOM TO SHOW MORE ACCURATE ERROR OUTPUT
I'm parsing somewhat large (~15MB) XML files with PHP for the first time using SimpleXML. The files are flight search results so they have long attributes (links back to Kayak; example:
"/book/flightcode=1238917408.NxJI6G.0.F.ORBITZAIR,ORBITZAIR.0.f36f1ea92513977249aa695112410052&sid=26-Vu01v7ilzhSAjPVLZ3Ul"
SimpleXML throws this error when parsing:
"Entity: line 10: parser error : EntityRef: expecting ';' in" and then;
"38917408.NxJI6G.0.F.ORBITZAIR,ORBITZAIR.0.f36f1ea92513977249aa695112410052&sid in"
and then;
"simplexml_load_string() [function.simplexml-load-string]: ^ in,"
and so forth for each line where there are these urls.
I found a mention of SimpleXML not liking long attributes on php.net with no solution. I would rather just use and learn SimpleXML for now and work past this error if there is a non-janky, somewhat easy workaround.
Does anyone have a solution? Thanks in advance!
I tried entering the first 13 lines of the XML but it only outputs the info without the XML so.... I can do that if it will help. I'm not sure if using another parser/extension would reduce the functionality or ease of use but please feel free to suggest another if there's not workaround (DOM or XMLReader is what I'm thinking perhaps).
EDITS BELOW TO INCLUDE LESS ADULTERATED ERROR OUTPUT:
http://dl.dropbox.com/u/10206237/stack_overflow_xml.xml
ERROR 1:
simplexml_load_string() [<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: Entity: line 10: parser error : EntityRef: expecting ';' in
ERROR 2:(The XML I think is fine because it works with a Python script using DOM; I'm translating it to PHP because I don't know Python). I didn't know that the output in the browser would be different. Thanks for being patient.)
<a href='function.simplexml-load-string'>function.simplexml-load-string</a>]: 38917408.Pt8rW8.0.F.ORBITZAIR,ORBITZAIR.0.f36f1ea92513977249aa695112410052&_sid_ in
ERROR 3:
function.simplexml-load-string</a>]: ^ in
(all of those spaces are in there)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
正如其他答案和评论中提到的,您的源 XML 已损坏,并且 XML 解析器应该拒绝无效输入。 libxml 有一个“恢复”模式,可以让您加载这个损坏的 XML,但您会丢失“&sid”部分,因此它没有帮助。
如果你很幸运并且喜欢冒险,你可以尝试通过某种方式修复输入来使其发挥作用。您可以使用一些字符串替换来转义看起来像在 URL 的查询部分中的 & 符号。
当然,这只是只不过是一种黑客,解决您的情况的唯一好方法是要求您的 XML 提供商修复他们的生成器。因为如果它生成损坏的 XML,谁知道还会有哪些其他错误被忽视呢?
As mentionned in other answers and comments, your source XML is broken and XML parsers are supposed to reject invalid input. libxml has a "recover" mode which would let you load this broken XML, but you would lose the "&sid" part so it wouldn't help.
If you're lucky and you like taking chances, you can try to somehow make it work by kind-of-fixing the input. You can use some string replacement to escape the ampersands that look like they're in the query part of an URL.
This is, of course, nothing but a hack and the only good way to fix your situation is to ask your XML provider to fix their generator. Because if it generates broken XML, who knows what other errors slip by unnoticed?
达里尔在上面的评论中对于为什么会发生这种情况有正确的答案。解决此问题的一种方法是执行 str_replace() 来替换所有 '&'带有 '&' 的 & 符号在 XML 中。根据 PHP 手册,您还可以使用此正则表达式将 & 符号替换为其实体:
Darryl has the right answer as to why this is happening in his comment above. One way of fixing this would be to do a str_replace() to replace all '&' ampersands with '&' in the XML. According to the PHP manual you could also use this regular expression to replace ampersands with their entities:
也许解析的 xml 文件对于解析器来说太大了。但是您可以尝试将 LIBXML_PARSEHUGE 作为选项传递 - 这对我的情况有帮助。
Maybe the parsed xml file may be too big for the parser. But you can try to pass LIBXML_PARSEHUGE as an option - which helped in my case.
我在使用 13MB 文件时遇到了这个问题,并通过包含
LIBXML_PARSEHUGE
参数解决了该问题:注意:在 1GB 下使用
ini_set
并没有解决我的问题,因为 PARSED 内容占用的内容比这个多。更激进的方法是使用其他库来流式传输而不是加载整个文件(SAX 解析器与 DOM 解析器),例如 XML Streamer< /a>
I had this problem with 13MB files and solved it by including
LIBXML_PARSEHUGE
parameter:NOTE: using
ini_set
at 1GB didnt solve my problem because PARSED contents occupied more than this.A more radical approach is using other libraries to STREAM rather than LOAD WHOLE FILE (SAX parser versus DOM parser), like XML Streamer