lxml XMLSyntaxError:未找到命名空间默认前缀
我正在使用 lxml 来读取我的 xml 文件。我正在使用如下代码。它在 lxml2.3 beta1 上工作得很好,但在 lxml2.3 上它给了我 zn xml 语法错误,如下所示。我浏览了两个版本的发行说明,但无法弄清楚是什么导致了此错误或如何修复它。如果您遇到过这样的事情或有任何线索,请提供帮助。
谢谢!!
代码:
from lxml import etree
def parseXml(context,attribList,elemList):
for event, element in context:
if element.tag in elemList:
#read element attributes
element.clear()
def main(object):
ns='{NS}'
attribList=['name','age','id']
elemList=[ns+'Employee',ns+'Experience',ns+'Employment',ns+'Project',ns+'Award']
context=etree.iterparse(fullFilePath, events=("start","end"))
parseXml(context,attribList,elemList)
错误:
文件“iterparse.pxi”,第 478 行,位于 lxml.etree.iterparse.下一个 (src/lxml/lxml.etree.c:95348) 文件 “iterparse.pxi”,第 530 行,位于 lxml.etree.iterparse._read_more_events (src/lxml/lxml.etree.c:95886) 文件 “parser.pxi”,第 585 行,位于 lxml.etree._raiseParseError (src/lxml/lxml.etree.c:71955) XMLSyntaxError:命名空间默认值 未找到前缀,第 545 行,列 73
xml 示例 -
<root xmlns='NS'>
<Employee Name="Mr.ZZ" Age="30">
<Experience TotalYears="10" StartDate="2000-01-01" EndDate="2010-12-12">
<Employment id = "1" EndTime="ABC" StartDate="2000-01-01" EndDate="2002-12-12">
<Project Name="ABC_1" Team="4">
</Project>
</Employment>
<Employment id = "2" EndTime="XYZ" StartDate="2003-01-01" EndDate="2010-12-12">
<PromotionStatus>Manager</PromotionStatus>
<Project Name="XYZ_1" Team="7">
<Award>Star Team Member</Award>
</Project>
</Employment>
</Experience>
</Employee>
</root>
'Employee' 在根中重复。错误发生在解析器正确地遍历了许多员工之后。
编辑1: 在捕获异常时,我捕获以下内容:
WARNING:NAMESPACE:NS_ERR_UNDEFINED_NAMESPACE: Namespace default prefix was not found
I am using lxml to read my xml file. I am using a code something like below. It works just fine with lxml2.3 beta1, but with lxml2.3 it gives me zn xml syntax error as shown below. I went through the release notes for both versions, but could not figure out what could have caused this error or how to fix it. Please help if you have come across such a thing or have any clues about it.
Thanks!!
Code:
from lxml import etree
def parseXml(context,attribList,elemList):
for event, element in context:
if element.tag in elemList:
#read element attributes
element.clear()
def main(object):
ns='{NS}'
attribList=['name','age','id']
elemList=[ns+'Employee',ns+'Experience',ns+'Employment',ns+'Project',ns+'Award']
context=etree.iterparse(fullFilePath, events=("start","end"))
parseXml(context,attribList,elemList)
Error:
File "iterparse.pxi", line 478, in
lxml.etree.iterparse.next
(src/lxml/lxml.etree.c:95348) File
"iterparse.pxi", line 530, in
lxml.etree.iterparse._read_more_events
(src/lxml/lxml.etree.c:95886) File
"parser.pxi", line 585, in
lxml.etree._raiseParseError
(src/lxml/lxml.etree.c:71955)
XMLSyntaxError: Namespace default
prefix was not found, line 545, column
73
xml sample -
<root xmlns='NS'>
<Employee Name="Mr.ZZ" Age="30">
<Experience TotalYears="10" StartDate="2000-01-01" EndDate="2010-12-12">
<Employment id = "1" EndTime="ABC" StartDate="2000-01-01" EndDate="2002-12-12">
<Project Name="ABC_1" Team="4">
</Project>
</Employment>
<Employment id = "2" EndTime="XYZ" StartDate="2003-01-01" EndDate="2010-12-12">
<PromotionStatus>Manager</PromotionStatus>
<Project Name="XYZ_1" Team="7">
<Award>Star Team Member</Award>
</Project>
</Employment>
</Experience>
</Employee>
</root>
The 'Employee' are repeated within the root. And the error happens after the parser has gone though many of the employees correctly.
Edit 1:
On capturing the exception, I catch the following:
WARNING:NAMESPACE:NS_ERR_UNDEFINED_NAMESPACE: Namespace default prefix was not found
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
好吧,我终于明白发生了什么事。按照清理使用过的元素的好建议,我清理了所有元素,包括根节点。根节点是具有默认命名空间前缀的节点,该前缀适用于该根内的所有节点。由于我清除了根节点,默认名称空间前缀不再是其子元素 nsmap 的一部分。以前的版本似乎对此很宽容,但最新版本在这个意义上更加严格。
直到我读完 xml 才清除根元素,这对我来说很有效。
Ok, so I finally figured out what was going on. Following a good advice to clean up used elements, I was clearing up all the elements, including the root node. The root node is the one with the default namespace prefix which applies to all nodes within that root. Since I cleared off my root node, the default namespace prefix was no longer a part of the nsmap of its subelements. The previous versions seem to be forgiving of this but the latest version was more strict in this sense.
Not clearing the root element untill I was done reading the xml did the trick for me.
当您尝试 xpath 表达式时,最常出现默认命名空间问题。对于仅解析示例中的流,2.3.0 应该可以与未命名的默认命名空间一起正常工作。
也许您应该发布出现此错误的最小可能的 xml 文件(第 545 行在文件中非常深入,因此出现此错误)
default namespace problems most often arise when you are attempting xpath expression. For just parsing the stream as in your sample, 2.3.0 should work fine with an unnamed default namespace.
Perhaps you should post the smallest possible xml file that gives this error (line 545 is pretty deep into the file to have this error)