编辑XML文件,而无需忽略属性值中的空格
我想使用另一个 xml 文件中的值更新一个 xml 文件。解析后工作正常,但我对指定的属性值有一个问题。解析 XML 文件后,每个空格都会被忽略,例如,如果
value='something
something'
它会更改为 value='something Something'
而我的文件不能是这样。
有一张图片以某种方式显示了我关心的内容:
我想保留这些值更多比一行。据我了解,解析 xml 文件会破坏原始文件的结构,但是有没有简单的方法来修复我的程序,以便它以某种方式避免忽略空格?
这是我的代码:
import xml.etree.ElementTree as ET
Mainfile = 'Mainfile_1.xml'
tree = ET.parse(Mainfile)
root = tree.getroot()
fixfile = 'fixfile_1.xml'
tree2 = ET.parse(fixfile)
root2 = tree2.getroot()
for objects in root.iter('object'):
objid = objects.attrib.get('id')
for attributes in objects.getchildren():
name = attributes.attrib.get('name')
value = attributes.attrib.get('value')
if value == 'FAIL':
for objects2 in root2.iter('object'):
objid2 = objects2.attrib.get('id')
for attributes2 in objects2.getchildren():
name2 = attributes2.attrib.get('name')
value2 = attributes2.attrib.get('value')
if objid2 == objid:
if name == name2:
attributes.set('value', value2)
tree.write('Mainfile_1updated.xml',xml_declaration=True, encoding='UTF-8')
这是 MainXML:
<?xml version='1.0' encoding='UTF-8'?>
<Module bs='Mainfile_1'>
<object name='namex' number='1' id='1000'>
<item name='item0' value='100'/>
<item name='item00' value='100'/>
</object>
<object name='namey' number='2' id='1001'>
<item name='item1' value='100'/>
<item name='item00' value='100'/>
</object>
<object name='name1' number='3' id='1234'>
<item name='item1' value='FAIL'/>
<item name='item2' value='233
233'/>
<item name='item3' value='233'/>
<item name='item4' value='FAIL'/>
</object>
<object name='name2' number='4' id='1238'>
<item name='item8' value='FAIL'/>
<item name='item9' value='233'/>
</object>
<object name='name32' number='5' id='2345'>
<item name='item1' value='111'/>
<item name='item2' value='FAIL'/>
</object>
<object name='name4' number='6' id='2347'>
<item name='item1' value='FAIL'/>
<item name='item2' value='FAIL'/>
<item name='item3' value='233'/>
<item name='item4' value='FAIL'/>
</object>
</Module>
这是修复文件:
<?xml version='1.0' encoding='UTF-8'?>
<Module bs='Mainfile_1'>
<object id='1234'>
<item name='item1' value='something
something111'/>
<item name='item4' value='something
1something'/>
</object>
<object id='1238'>
<item name='item8' value='something12
1something'/>
</object>
<object id='2345'>
<item name='item2' value='something
12something'/>
</object>
<object id='2347'>
<item name='item1' value='something14
13of something'/>
<item name='item2' value='something
11something'/>
<item name='item4' value='something14
something14
something12
13something'/>
</object>
</Module>
I want to update one xml file with values from another xml file. It works fine after parsing but I have one problem with specified attributes values. After parsing XML file, every whitespace is ignored, for example, if
value='something
something'
it will change to value='something something'
and my file can't be like that.
There is a picture showing someway what is my concerne:
I want to keep these values with more than one line. As I undestand, parsing xml file destroys structure of the original file, but is there any simple way to fix my program so It will somehow avoid igoring whitespaces?
Here is my code:
import xml.etree.ElementTree as ET
Mainfile = 'Mainfile_1.xml'
tree = ET.parse(Mainfile)
root = tree.getroot()
fixfile = 'fixfile_1.xml'
tree2 = ET.parse(fixfile)
root2 = tree2.getroot()
for objects in root.iter('object'):
objid = objects.attrib.get('id')
for attributes in objects.getchildren():
name = attributes.attrib.get('name')
value = attributes.attrib.get('value')
if value == 'FAIL':
for objects2 in root2.iter('object'):
objid2 = objects2.attrib.get('id')
for attributes2 in objects2.getchildren():
name2 = attributes2.attrib.get('name')
value2 = attributes2.attrib.get('value')
if objid2 == objid:
if name == name2:
attributes.set('value', value2)
tree.write('Mainfile_1updated.xml',xml_declaration=True, encoding='UTF-8')
Here is MainXML:
<?xml version='1.0' encoding='UTF-8'?>
<Module bs='Mainfile_1'>
<object name='namex' number='1' id='1000'>
<item name='item0' value='100'/>
<item name='item00' value='100'/>
</object>
<object name='namey' number='2' id='1001'>
<item name='item1' value='100'/>
<item name='item00' value='100'/>
</object>
<object name='name1' number='3' id='1234'>
<item name='item1' value='FAIL'/>
<item name='item2' value='233
233'/>
<item name='item3' value='233'/>
<item name='item4' value='FAIL'/>
</object>
<object name='name2' number='4' id='1238'>
<item name='item8' value='FAIL'/>
<item name='item9' value='233'/>
</object>
<object name='name32' number='5' id='2345'>
<item name='item1' value='111'/>
<item name='item2' value='FAIL'/>
</object>
<object name='name4' number='6' id='2347'>
<item name='item1' value='FAIL'/>
<item name='item2' value='FAIL'/>
<item name='item3' value='233'/>
<item name='item4' value='FAIL'/>
</object>
</Module>
And here is fix file:
<?xml version='1.0' encoding='UTF-8'?>
<Module bs='Mainfile_1'>
<object id='1234'>
<item name='item1' value='something
something111'/>
<item name='item4' value='something
1something'/>
</object>
<object id='1238'>
<item name='item8' value='something12
1something'/>
</object>
<object id='2345'>
<item name='item2' value='something
12something'/>
</object>
<object id='2347'>
<item name='item1' value='something14
13of something'/>
<item name='item2' value='something
11something'/>
<item name='item4' value='something14
something14
something12
13something'/>
</object>
</Module>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
那么您必须停止使用这样的属性。属性值内的换行符将在以下情况下标准化为空格: XML 文件被解析。您可以打开一个文本编辑器并生成这样的 XML:
但是在解析后,这将变成等效的
可以这样做。
如果您想在属性值中存储制表符或换行符之类的内容,您 必须明确 在解析文档时它们将被保留:
这两个都会在生成的 DOM 中给出
"something\n some"
的属性值。然后 使用 lxml实际上无能为力
,它们的实现是正确的。
Then you must stop using attributes like that. Line breaks characters inside attribute values will be normalized into spaces when the XML file is parsed. You can open a text editor and produce XML like this:
but upon parsing, this will turn into the equivalent of
That's just how it works.
If you want to store things like tabs or newlines in attribute values, you must explicitly escape them. Then they will be retained when the document is parsed:
Both of these will give an attribute value of
"something\n something"
in the resulting DOM.That being said, ElementTree's implementation is broken, there is literally nothing you can do.
Use lxml, their implementation is correct.