忽略 XML 标记之间的文本
""" test.xml
<xyz>
<pqr>
<abc><a href="data:text/html;charset=utf-8,base64,JTNjc2NyaXB0JTNlYWxlcnQoIlhTUyIpO2hpc3RvcnkuYmFjaygpOyUzYy9zY3JpcHQlM2UiPjwvYT4=</abc>
</pqr>
<pqr>
<abc><iframe src="data:text/html;charset=utf-8,base64,JTNjc2NyaXB0JTNlYWxlcnQoIlhTUyIpO2hpc3RvcnkuYmFjaygpOyUzYy9zY3JpcHQlM2UiPjwv</abc>
</pqr>
<xyz>
""""
当我使用这个 XML 文件并在 python 中解析时,它会显示错误(格式不正确)。我如何解析这个 xml 文件或任何其他方法来从此文件获取数据。
""" test.xml
<xyz>
<pqr>
<abc><a href="data:text/html;charset=utf-8,base64,JTNjc2NyaXB0JTNlYWxlcnQoIlhTUyIpO2hpc3RvcnkuYmFjaygpOyUzYy9zY3JpcHQlM2UiPjwvYT4=</abc>
</pqr>
<pqr>
<abc><iframe src="data:text/html;charset=utf-8,base64,JTNjc2NyaXB0JTNlYWxlcnQoIlhTUyIpO2hpc3RvcnkuYmFjaygpOyUzYy9zY3JpcHQlM2UiPjwv</abc>
</pqr>
<xyz>
""""
when i use this XML file and parse in python then it shows error(not well formed) . How can i parse this xml file or any other method to get the data from this file.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以先更改 xml 并使用 cdata 包含格式不正确的 xml
示例:
请参阅: http:// www.w3schools.com/xml/xml_cdata.asp
之后你就可以使用 python xml 解析器
You can change the xml first and use cdata to enclose not well formatted xml
example:
see: http://www.w3schools.com/xml/xml_cdata.asp
After this you can just use python xml parser
标记不平衡(有两个开始标记),并且test.xml
行是虚假的。您的标记未关闭,并且其
href
属性未加引号。与您的 iframe 标记相同。您正在使用的解析器应该告诉您在哪里遇到了错误。修复它然后你就可以开始了。如果你想解析xml,首先你必须确保它是格式良好的 XML。通常,可以进行一些修改,使原本无法解析的代码片段形成良好的格式,以便您可以使用标准解析器。
The
<xyz>
tag is not balanced (there are two opening tags) and thetest.xml
line is spurious. Your<a>
tag is not closed and it'shref
attribute is not quoted. Same with your iframe tag. The parser you're using should tell you where it encountered the error. Fix it and then you'll be good to go.If you want to parse xml, first you must ensure that it is well formed XML. Often, it's possible to do a little massaging to make an otherwise unparseable snippet something well formed so that you can use a standard parser.
快速目视检查您的 XML 片段,我突然发现两件事:您的 XML 格式不正确:
和
元素也未关闭。
A quick visual inspection of your XML fragment, and two things jumped out at me re: your XML not being well formed:
<xyz>
element is missing it's slash: it should be</xyz>
<a>
and<iframe>
elements are also not closed.