编辑XML文件,而无需忽略属性值中的空格

发布于 2025-01-18 09:31:22 字数 3351 浏览 0 评论 0原文

我想使用另一个 xml 文件中的值更新一个 xml 文件。解析后工作正常,但我对指定的属性值有一个问题。解析 XML 文件后,每个空格都会被忽略,例如,如果

value='something

something'

它会更改为 value='something Something' 而我的文件不能是这样。

有一张图片以某种方式显示了我关心的内容:

图片

我想保留这些值更多比一行。据我了解,解析 xml 文件会破坏原始文件的结构,但是有没有简单的方法来修复我的程序,以便它以某种方式避免忽略空格?

这是我的代码:

import xml.etree.ElementTree as ET

Mainfile = 'Mainfile_1.xml'
tree = ET.parse(Mainfile)
root = tree.getroot()
fixfile = 'fixfile_1.xml'
tree2 = ET.parse(fixfile)
root2 = tree2.getroot()
for objects in root.iter('object'):
    objid = objects.attrib.get('id')
    for attributes in objects.getchildren():
        name = attributes.attrib.get('name')
        value = attributes.attrib.get('value')
        if value == 'FAIL':
            for objects2 in root2.iter('object'):
                objid2 = objects2.attrib.get('id')
                for attributes2 in objects2.getchildren():
                    name2 = attributes2.attrib.get('name')
                    value2 = attributes2.attrib.get('value')
                    if objid2 == objid:
                        if name == name2:
                            attributes.set('value', value2)

tree.write('Mainfile_1updated.xml',xml_declaration=True, encoding='UTF-8')

这是 MainXML:

<?xml version='1.0' encoding='UTF-8'?>
<Module bs='Mainfile_1'>
<object name='namex' number='1' id='1000'>
    <item name='item0' value='100'/>
    <item name='item00' value='100'/>
</object>
<object name='namey' number='2' id='1001'>
    <item name='item1' value='100'/>
    <item name='item00' value='100'/>
</object>
<object name='name1' number='3' id='1234'>
    <item name='item1' value='FAIL'/>
    <item name='item2' value='233
    
    233'/>
    <item name='item3' value='233'/>
    <item name='item4' value='FAIL'/>
</object>
<object name='name2' number='4' id='1238'>
    <item name='item8' value='FAIL'/>
    <item name='item9' value='233'/>
</object>
<object name='name32' number='5' id='2345'>
    <item name='item1' value='111'/>
    <item name='item2' value='FAIL'/>
</object>
<object name='name4' number='6' id='2347'>
    <item name='item1' value='FAIL'/>
    <item name='item2' value='FAIL'/>
    <item name='item3' value='233'/>
    <item name='item4' value='FAIL'/>
</object>
</Module>

这是修复文件:

<?xml version='1.0' encoding='UTF-8'?>
<Module bs='Mainfile_1'>
<object id='1234'>
    <item name='item1' value='something
something111'/>
    <item name='item4' value='something
1something'/>
</object>
<object id='1238'>
    <item name='item8' value='something12
1something'/>
</object>
<object id='2345'>
    <item name='item2' value='something
12something'/>
</object>
<object id='2347'>
    <item name='item1' value='something14
13of something'/>
    <item name='item2' value='something
11something'/>
    <item name='item4' value='something14
something14
something12
13something'/>
</object>
</Module>

I want to update one xml file with values from another xml file. It works fine after parsing but I have one problem with specified attributes values. After parsing XML file, every whitespace is ignored, for example, if

value='something

something'

it will change to value='something something' and my file can't be like that.

There is a picture showing someway what is my concerne:

picture

I want to keep these values with more than one line. As I undestand, parsing xml file destroys structure of the original file, but is there any simple way to fix my program so It will somehow avoid igoring whitespaces?

Here is my code:

import xml.etree.ElementTree as ET

Mainfile = 'Mainfile_1.xml'
tree = ET.parse(Mainfile)
root = tree.getroot()
fixfile = 'fixfile_1.xml'
tree2 = ET.parse(fixfile)
root2 = tree2.getroot()
for objects in root.iter('object'):
    objid = objects.attrib.get('id')
    for attributes in objects.getchildren():
        name = attributes.attrib.get('name')
        value = attributes.attrib.get('value')
        if value == 'FAIL':
            for objects2 in root2.iter('object'):
                objid2 = objects2.attrib.get('id')
                for attributes2 in objects2.getchildren():
                    name2 = attributes2.attrib.get('name')
                    value2 = attributes2.attrib.get('value')
                    if objid2 == objid:
                        if name == name2:
                            attributes.set('value', value2)

tree.write('Mainfile_1updated.xml',xml_declaration=True, encoding='UTF-8')

Here is MainXML:

<?xml version='1.0' encoding='UTF-8'?>
<Module bs='Mainfile_1'>
<object name='namex' number='1' id='1000'>
    <item name='item0' value='100'/>
    <item name='item00' value='100'/>
</object>
<object name='namey' number='2' id='1001'>
    <item name='item1' value='100'/>
    <item name='item00' value='100'/>
</object>
<object name='name1' number='3' id='1234'>
    <item name='item1' value='FAIL'/>
    <item name='item2' value='233
    
    233'/>
    <item name='item3' value='233'/>
    <item name='item4' value='FAIL'/>
</object>
<object name='name2' number='4' id='1238'>
    <item name='item8' value='FAIL'/>
    <item name='item9' value='233'/>
</object>
<object name='name32' number='5' id='2345'>
    <item name='item1' value='111'/>
    <item name='item2' value='FAIL'/>
</object>
<object name='name4' number='6' id='2347'>
    <item name='item1' value='FAIL'/>
    <item name='item2' value='FAIL'/>
    <item name='item3' value='233'/>
    <item name='item4' value='FAIL'/>
</object>
</Module>

And here is fix file:

<?xml version='1.0' encoding='UTF-8'?>
<Module bs='Mainfile_1'>
<object id='1234'>
    <item name='item1' value='something
something111'/>
    <item name='item4' value='something
1something'/>
</object>
<object id='1238'>
    <item name='item8' value='something12
1something'/>
</object>
<object id='2345'>
    <item name='item2' value='something
12something'/>
</object>
<object id='2347'>
    <item name='item1' value='something14
13of something'/>
    <item name='item2' value='something
11something'/>
    <item name='item4' value='something14
something14
something12
13something'/>
</object>
</Module>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

嗫嚅 2025-01-25 09:31:22

“它将更改为value='something Something',而我的文件不能是这样。*

那么您必须停止使用这样的属性。属性值内的换行符将在以下情况下标准化为空格: XML 文件被解析。您可以打开一个文本编辑器并生成这样的 XML:

<element value="something
something" />

但是在解析后,这将变成等效的

<element value="something something" />

可以这样做。

如果您想在属性值中存储制表符或换行符之类的内容,您 必须明确 在解析文档时它们将被保留:

<element value="something

something" />

<element value="something
 something" />

这两个都会在生成的 DOM 中给出 "something\n some" 的属性值。


然后 使用 lxml实际上无能为力

,它们的实现是正确的。

from lxml import etree as ET

value = ET.fromstring('<element value="something
 something" />').attrib['value']
print(value)
# => 'something\n something'

value = ET.fromstring('<element value="something
\nsomething" />').attrib['value']
print(value)
# => 'something\n something'

elem = ET.fromstring('<element />')
elem.attrib['value'] = 'something\n something'
xml = ET.tostring(elem)
print(xml)
# => b'<element value="something
 something"/>'

"it will change to value='something something' and my file can't be like that.*

Then you must stop using attributes like that. Line breaks characters inside attribute values will be normalized into spaces when the XML file is parsed. You can open a text editor and produce XML like this:

<element value="something
something" />

but upon parsing, this will turn into the equivalent of

<element value="something something" />

That's just how it works.

If you want to store things like tabs or newlines in attribute values, you must explicitly escape them. Then they will be retained when the document is parsed:

<element value="something

something" />

<element value="something
 something" />

Both of these will give an attribute value of "something\n something" in the resulting DOM.


That being said, ElementTree's implementation is broken, there is literally nothing you can do.

Use lxml, their implementation is correct.

from lxml import etree as ET

value = ET.fromstring('<element value="something
 something" />').attrib['value']
print(value)
# => 'something\n something'

value = ET.fromstring('<element value="something
\nsomething" />').attrib['value']
print(value)
# => 'something\n something'

elem = ET.fromstring('<element />')
elem.attrib['value'] = 'something\n something'
xml = ET.tostring(elem)
print(xml)
# => b'<element value="something
 something"/>'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文