lxml - 解析没有换行符的 xml
我在 python 中使用 lxml iterparse 来循环遍历 xml 文件中的元素。它适用于大多数 xml,但对某些 xml 则失败。其中之一没有换行符。错误和此类 xml 的示例如下。有什么线索吗?
谢谢!!
<root><person><name>"xyz"</name><age>"10"</age></person><person><name>"abc"</name><age>"20"</age></person></root>
错误
XMLSyntaxError: Document is empty, line 1, column 1
代码-
from lxml import etree
def parseXml(context,elemList):
for event, element in context:
if element.tag in elemList:
#read text and attributes is any
element.clear()
def main(object):
elemList= ['name','age','id']
context=etree.iterparse(fullFilePath, events=("start","end"))
parseXml(context,elemList)
I am using lxml iterparse in python to loop through the elements in my xml file. It works fine with most of the xmls, but fails for some. One of them has no line breaks in it. The error and a sample of such xml are as below. Any clues?
Thanks!!
<root><person><name>"xyz"</name><age>"10"</age></person><person><name>"abc"</name><age>"20"</age></person></root>
error -
XMLSyntaxError: Document is empty, line 1, column 1
code -
from lxml import etree
def parseXml(context,elemList):
for event, element in context:
if element.tag in elemList:
#read text and attributes is any
element.clear()
def main(object):
elemList= ['name','age','id']
context=etree.iterparse(fullFilePath, events=("start","end"))
parseXml(context,elemList)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
etree.iterparse 需要源参数的缓冲区。您传递的变量名称“fullFilePath”告诉我它不是文件(因此解析器正在尝试解析文件内容中的 file_path )。
尝试传递打开的文件。
或字符串:
PS:这是什么意思?
etree.iterparse expects buffer for source argument. And name of variable you passing, "fullFilePath", tells me that it's not file (So parser is trying to parse file_path insted of file content ).
Try passing opened file instead.
or string:
PS: And what to do you mean by this?