ExpatError:文档元素后出现垃圾

发布于 2024-12-08 21:16:37 字数 1465 浏览 4 评论 0原文

我真的不知道,问题是什么?我收到以下错误:

File "C:\Python27\lib\xml\dom\expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
ExpatError: junk after document element: line 5, column 0

我没有看到垃圾!有什么帮助吗?我快疯了……

text = """<questionaire>
<question>
    <questiontext>Question1</questiontext>
    <answer>Your Answer: 99</answer>
</question>
<question>
    <questiontext>Question2</questiontext>
    <answer>Your Answer: 64</answer>
</question>
<question>
    <questiontext>Question3</questiontext>
    <answer>Your Answer: 46</answer>
</question>
<question>
    <questiontext>Bitte geben</questiontext>
    <answer>Your Answer: 544</answer>
    <answer>Your Answer: 943</answer>
</question>
</questionaire>"""

cleandata = text.split('<questionaire>')
cleandatastring= "".join(cleandata)
stripped = cleandatastring.strip()
planhtml = stripped.split('</questionaire>')[0]
clean= planhtml.strip()


from xml.dom import minidom

doc = minidom.parseString(clean)
for question in doc.getElementsByTagName('question'):
    for answer in question.getElementsByTagName('answer'):
        if answer.childNodes[0].nodeValue.strip() == 'Your Answer: 99':
            question.parentNode.removeChild(question)

print doc.toxml() 

谢谢!

I really don't know, what the Problem is? I get the following error:

File "C:\Python27\lib\xml\dom\expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
ExpatError: junk after document element: line 5, column 0

I DONT SEE NO JUNK! Any help? I'm getting crazy......

text = """<questionaire>
<question>
    <questiontext>Question1</questiontext>
    <answer>Your Answer: 99</answer>
</question>
<question>
    <questiontext>Question2</questiontext>
    <answer>Your Answer: 64</answer>
</question>
<question>
    <questiontext>Question3</questiontext>
    <answer>Your Answer: 46</answer>
</question>
<question>
    <questiontext>Bitte geben</questiontext>
    <answer>Your Answer: 544</answer>
    <answer>Your Answer: 943</answer>
</question>
</questionaire>"""

cleandata = text.split('<questionaire>')
cleandatastring= "".join(cleandata)
stripped = cleandatastring.strip()
planhtml = stripped.split('</questionaire>')[0]
clean= planhtml.strip()


from xml.dom import minidom

doc = minidom.parseString(clean)
for question in doc.getElementsByTagName('question'):
    for answer in question.getElementsByTagName('answer'):
        if answer.childNodes[0].nodeValue.strip() == 'Your Answer: 99':
            question.parentNode.removeChild(question)

print doc.toxml() 

Thanx!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

陌路黄昏 2024-12-15 21:16:37

您的原始 text 字符串是格式良好的 XML。然后你对它做了一堆破坏它的事情。解析您的原始文本,就可以了。

XML 必须只有一个顶级元素。当您解析它时,它已经具有许多顶级 标记。 XML 解析器将第一个元素解析为根元素,然后惊讶地发现另一个顶级元素。

Your original text string is well-formed XML. Then you do a bunch of stuff to it that breaks it. Parse your original text, and you will be fine.

XML is required to have exactly one top-level element. By the time you parse it, it has a number of top-level <question> tags. The XML parser is parsing the first one as a root element, and then is surprised to find another top-level element.

甜`诱少女 2024-12-15 21:16:37

就我而言,这是由 libxml2-2.9.11 中所做的更改引起的,该更改使 tostring() (lxml) 返回更多内容(什么跟随元素)比它应该的。例如,

from lxml import etree

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<a>
  <b>
  </b>
</a>
'''
t = etree.fromstring(xml.encode()).getroottree()
print(etree.tostring(
  t.xpath('/a/b')[0],
  encoding=t.docinfo.encoding,
).decode())

预期输出:

<b>
  </b>

实际输出:

<b>
  </b>
</a>

如果您将结果传递给 xml.dom.minidom.parseString(),它会抱怨。

更多信息请此处

为了避免这种情况,您需要 libxml2 <= 2.9.10 或 Alpine Linux >= 3.14。

In my case it was caused by the changes made in libxml2-2.9.11 that made tostring() (lxml) return more content (what follows the element) than it should. E.g.

from lxml import etree

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<a>
  <b>
  </b>
</a>
'''
t = etree.fromstring(xml.encode()).getroottree()
print(etree.tostring(
  t.xpath('/a/b')[0],
  encoding=t.docinfo.encoding,
).decode())

Expected output:

<b>
  </b>

Actual output:

<b>
  </b>
</a>

Should you pass the result to xml.dom.minidom.parseString(), it will complain.

More on it here.

To avoid this you either need libxml2 <= 2.9.10, or Alpine Linux >= 3.14.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文