python 中的 xml 缺少元素
系统使用python 2.7.2中的dom解析器。目标是提取 .db 文件并在 sql server 上使用它。我目前对 sqlite3 库没有问题。我已经阅读了有关如何在解析 xml 文件时处理缺失元素的类似问题/答案。但我仍然无法找出解决方案。 XML 有 15000 多个元素。这是 xml 的基本代码:
<topo>
<vlancard>
<id>4545</id>
<nodeValue>21</nodeValue>
<vlanName>voice</vlanName>
</vlancard>
<vlancard>
<id>1234</id>
<nodeValue>42</nodeValue>
<vlanName>camera</vlanName>
</vlancard>
<vlancard>
<id>9876</id>
<nodeValue>84</nodeValue>
</vlancard>
</topo>
与第三个元素一样,有几个元素没有节点。这会导致元素编号不一致。即
from xml.dom import minidom
xmldoc = minidom.parse('c:\vlan.xml')
vlId = xmldoc.getElementsByTagName('id')
vlValue = xmldoc.getElementsByTagName('nodeValue')
vlName = xmldoc.getElementsByTagName('vlanName')
运行模块后:
IndexError: list index out of range
>>> len(id)
16163
>>> len(vlanName)
16155
由于此问题,元素排序出现问题。在打印表格时,解析器会传递丢失的元素,并且元素顺序会混淆。我使用一个简单的 while 循环将值插入到表中。
x=0
while x < (len(vlId)):
c.execute('''insert into vlan ('id','nodeValue','vlanName') values ('%s','%s','%s') ''' %(id[x].firstChild.nodeValue, nodeValue[x].firstChild.nodeValue, vlanName[x].firstChild.nodeValue))
x= x+1
我还能怎样做呢?任何帮助将不胜感激。
优素福
System uses dom parser in python 2.7.2. The goal is to extract the .db file and use it on sql server.I currently have no problem with sqlite3 library. I have read the similar questions/answers about how to handle a missing element while parsing xml files.But still I couldn't figure out the solution. xml has 15000+ elements. here is the basic code from xml:
<topo>
<vlancard>
<id>4545</id>
<nodeValue>21</nodeValue>
<vlanName>voice</vlanName>
</vlancard>
<vlancard>
<id>1234</id>
<nodeValue>42</nodeValue>
<vlanName>camera</vlanName>
</vlancard>
<vlancard>
<id>9876</id>
<nodeValue>84</nodeValue>
</vlancard>
</topo>
Like the 3rd element, several elements do not have the node. That causes inconsistency on element numbers. i.e.
from xml.dom import minidom
xmldoc = minidom.parse('c:\vlan.xml')
vlId = xmldoc.getElementsByTagName('id')
vlValue = xmldoc.getElementsByTagName('nodeValue')
vlName = xmldoc.getElementsByTagName('vlanName')
after running the module:
IndexError: list index out of range
>>> len(id)
16163
>>> len(vlanName)
16155
Because of this problem , problem occurs for ordering the elements. while printing the table , parser passes the missing elements and element orders are mixed up. I use a simple while loop to insert the values into the table.
x=0
while x < (len(vlId)):
c.execute('''insert into vlan ('id','nodeValue','vlanName') values ('%s','%s','%s') ''' %(id[x].firstChild.nodeValue, nodeValue[x].firstChild.nodeValue, vlanName[x].firstChild.nodeValue))
x= x+1
How else can I do this? Any help will be appreciated.
Yusuf
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不是解析整个 xml 然后插入,而是解析每个 vlancard 并检索它的 id/值/名称,然后将它们插入到数据库中。
Instead of parsing the entire xml and then inserting, parse each vlancard the retrieve it's id/value/name and then insert them into the DB.