ExpatError:文档元素后出现垃圾
我真的不知道,问题是什么?我收到以下错误:
File "C:\Python27\lib\xml\dom\expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
ExpatError: junk after document element: line 5, column 0
我没有看到垃圾!有什么帮助吗?我快疯了……
text = """<questionaire>
<question>
<questiontext>Question1</questiontext>
<answer>Your Answer: 99</answer>
</question>
<question>
<questiontext>Question2</questiontext>
<answer>Your Answer: 64</answer>
</question>
<question>
<questiontext>Question3</questiontext>
<answer>Your Answer: 46</answer>
</question>
<question>
<questiontext>Bitte geben</questiontext>
<answer>Your Answer: 544</answer>
<answer>Your Answer: 943</answer>
</question>
</questionaire>"""
cleandata = text.split('<questionaire>')
cleandatastring= "".join(cleandata)
stripped = cleandatastring.strip()
planhtml = stripped.split('</questionaire>')[0]
clean= planhtml.strip()
from xml.dom import minidom
doc = minidom.parseString(clean)
for question in doc.getElementsByTagName('question'):
for answer in question.getElementsByTagName('answer'):
if answer.childNodes[0].nodeValue.strip() == 'Your Answer: 99':
question.parentNode.removeChild(question)
print doc.toxml()
谢谢!
I really don't know, what the Problem is? I get the following error:
File "C:\Python27\lib\xml\dom\expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
ExpatError: junk after document element: line 5, column 0
I DONT SEE NO JUNK! Any help? I'm getting crazy......
text = """<questionaire>
<question>
<questiontext>Question1</questiontext>
<answer>Your Answer: 99</answer>
</question>
<question>
<questiontext>Question2</questiontext>
<answer>Your Answer: 64</answer>
</question>
<question>
<questiontext>Question3</questiontext>
<answer>Your Answer: 46</answer>
</question>
<question>
<questiontext>Bitte geben</questiontext>
<answer>Your Answer: 544</answer>
<answer>Your Answer: 943</answer>
</question>
</questionaire>"""
cleandata = text.split('<questionaire>')
cleandatastring= "".join(cleandata)
stripped = cleandatastring.strip()
planhtml = stripped.split('</questionaire>')[0]
clean= planhtml.strip()
from xml.dom import minidom
doc = minidom.parseString(clean)
for question in doc.getElementsByTagName('question'):
for answer in question.getElementsByTagName('answer'):
if answer.childNodes[0].nodeValue.strip() == 'Your Answer: 99':
question.parentNode.removeChild(question)
print doc.toxml()
Thanx!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的原始
text
字符串是格式良好的 XML。然后你对它做了一堆破坏它的事情。解析您的原始文本
,就可以了。XML 必须只有一个顶级元素。当您解析它时,它已经具有许多顶级
标记。 XML 解析器将第一个元素解析为根元素,然后惊讶地发现另一个顶级元素。Your original
text
string is well-formed XML. Then you do a bunch of stuff to it that breaks it. Parse your originaltext
, and you will be fine.XML is required to have exactly one top-level element. By the time you parse it, it has a number of top-level
<question>
tags. The XML parser is parsing the first one as a root element, and then is surprised to find another top-level element.就我而言,这是由
libxml2-2.9.11
中所做的更改引起的,该更改使tostring()
(lxml
) 返回更多内容(什么跟随元素)比它应该的。例如,预期输出:
实际输出:
如果您将结果传递给
xml.dom.minidom.parseString()
,它会抱怨。更多信息请此处。
为了避免这种情况,您需要
libxml2 <= 2.9.10
或 Alpine Linux >= 3.14。In my case it was caused by the changes made in
libxml2-2.9.11
that madetostring()
(lxml
) return more content (what follows the element) than it should. E.g.Expected output:
Actual output:
Should you pass the result to
xml.dom.minidom.parseString()
, it will complain.More on it here.
To avoid this you either need
libxml2 <= 2.9.10
, or Alpine Linux >= 3.14.