python 中的 xml 缺少元素

发布于 2024-11-24 18:32:18 字数 1444 浏览 2 评论 0原文

系统使用python 2.7.2中的dom解析器。目标是提取 .db 文件并在 sql server 上使用它。我目前对 sqlite3 库没有问题。我已经阅读了有关如何在解析 xml 文件时处理缺失元素的类似问题/答案。但我仍然无法找出解决方案。 XML 有 15000 多个元素。这是 xml 的基本代码：

<topo>
   <vlancard>
      <id>4545</id>
      <nodeValue>21</nodeValue>
      <vlanName>voice</vlanName>
   </vlancard>
   <vlancard>
      <id>1234</id>
      <nodeValue>42</nodeValue>
      <vlanName>camera</vlanName>
   </vlancard>
   <vlancard>
      <id>9876</id>
      <nodeValue>84</nodeValue>
   </vlancard>
</topo>

与第三个元素一样，有几个元素没有节点。这会导致元素编号不一致。即

from xml.dom import minidom
xmldoc = minidom.parse('c:\vlan.xml')
vlId = xmldoc.getElementsByTagName('id')
vlValue = xmldoc.getElementsByTagName('nodeValue')
vlName = xmldoc.getElementsByTagName('vlanName')

运行模块后：

IndexError: list index out of range
>>> len(id)
16163
>>> len(vlanName)
16155

由于此问题，元素排序出现问题。在打印表格时，解析器会传递丢失的元素，并且元素顺序会混淆。我使用一个简单的 while 循环将值插入到表中。

x=0
while x < (len(vlId)):
    c.execute('''insert into vlan ('id','nodeValue','vlanName') values ('%s','%s','%s') ''' %(id[x].firstChild.nodeValue, nodeValue[x].firstChild.nodeValue, vlanName[x].firstChild.nodeValue))
    x= x+1

我还能怎样做呢？任何帮助将不胜感激。

优素福

原文

System uses dom parser in python 2.7.2. The goal is to extract the .db file and use it on sql server.I currently have no problem with sqlite3 library. I have read the similar questions/answers about how to handle a missing element while parsing xml files.But still I couldn't figure out the solution. xml has 15000+ elements. here is the basic code from xml:

<topo>
   <vlancard>
      <id>4545</id>
      <nodeValue>21</nodeValue>
      <vlanName>voice</vlanName>
   </vlancard>
   <vlancard>
      <id>1234</id>
      <nodeValue>42</nodeValue>
      <vlanName>camera</vlanName>
   </vlancard>
   <vlancard>
      <id>9876</id>
      <nodeValue>84</nodeValue>
   </vlancard>
</topo>

Like the 3rd element, several elements do not have the node. That causes inconsistency on element numbers. i.e.

from xml.dom import minidom
xmldoc = minidom.parse('c:\vlan.xml')
vlId = xmldoc.getElementsByTagName('id')
vlValue = xmldoc.getElementsByTagName('nodeValue')
vlName = xmldoc.getElementsByTagName('vlanName')

after running the module:

IndexError: list index out of range
>>> len(id)
16163
>>> len(vlanName)
16155

Because of this problem , problem occurs for ordering the elements. while printing the table , parser passes the missing elements and element orders are mixed up. I use a simple while loop to insert the values into the table.

x=0
while x < (len(vlId)):
    c.execute('''insert into vlan ('id','nodeValue','vlanName') values ('%s','%s','%s') ''' %(id[x].firstChild.nodeValue, nodeValue[x].firstChild.nodeValue, vlanName[x].firstChild.nodeValue))
    x= x+1

How else can I do this? Any help will be appreciated.

Yusuf

分享到QQ

分享到微博