如何将 python 列表推导式转换为 xml

发布于 2024-10-17 17:19:14 字数 6174 浏览 1 评论 0原文

我需要一些帮助来找到有关进行列表理解并将其与 csv 数据文件合并并将所有内容转换为 xml 文件的教程或示例。通过阅读各种Python书籍&像 ditp、IYOCGwP、learnpythonthehardway、lxml tut、think python 和在线搜索这样的 pdf 文件我认为大部分都在那里。我只需要推动将所有事情联系在一起。我基本上使用的是 Excel 电子表格,并将其导出为 csv 文件。 csv 包含我需要映射到 xml 文件中的记录行。我是 Python 新手,我想用我的小项目来学习这门语言。列出的代码并不漂亮,但可以工作。我可以读取 csv 文件并将其转储到列表中。我可以组合 3 个列表并输出结果列表,然后我可以让我的程序吐出一个几乎以我需要的格式布局的 xml 框架。我将列出一个小样本的实际输出以及我试图使用此代码下面的 xml 完成的任务。抱歉,如果这太长了,这是我的第一篇文章。

import csv, datetime, os  
from lxml import etree  
from ElementTree_pretty import prettify

f = os.path.getsize("SO.csv")
fh = "SO.csv"
rh = open(fh, "rU")

rows = 0
try:
    rlist = csv.reader(rh)
    reports = []
    for row in rlist:
        '''print row.items()'''
        rowStripped = [x.strip(' ') for x in row]
        reports.append(rowStripped)
        rows +=1
except csv.Error, e:
    sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))

finally:
    rh.close()

root = etree.Element("co_ehs")
object = etree.SubElement(root, "object")
event = etree.SubElement(object, "event")
facets = etree.SubElement(event, "facets")
categories = etree.SubElement(facets, "categories")
instance = etree.SubElement(categories, "instance")
property = etree.SubElement(instance, "property")

facets = ['header','header','header','header','informational','header','informational']

categories =     ['processing','processing','processing','processing','short_title','file_num','short_narrative']

property = ['REPORT ID','NEXT REPORT ID','initial-event-date','number','title','summary-docket-num','description-story']

print('----------Printing Reports from CSV Data----------')
print reports
print('---------END OF CSV DATA-------------')
print
mappings = zip(facets, categories, property)
print('----------Printing Mappings from the zip of facets, categories, property ----------')
print mappings
print('---------END OF List Comprehension-------------')
print
print('----------Printing the xml skeleton that will contain the mappings and the csv data ----------')
print(etree.tostring(root, xml_declaration=True, encoding='UTF-8', pretty_print=True))
print('---------END OF XML Skeleton-------------')  


----My OUTPUT---  
----------Printing Reports from CSV Data----------  
[['1', '12-Dec-04', 'Vehicle Collision', '786689', 'No fault collision due to ice', '-1', '545671'], ['3', '15-Dec-04', 'OJT Injury', '87362', 'Paint fumes combusted causing 2nd degree burns', '4', '588456'], ['4', '17-Dec-04', 'OJT Injury', '87362', 'Paint fumes combusted causing 2nd degree burns', '-1', '58871'], ['1000', '12-Nov-05', 'Back Injury', '9854231', 'Lifting without a support device', '-1', '545671'], ['55555', '12-Jan-06', 'Foot Injury', '7936547', 'Office injury - heavy item dropped on foot', '-1', '545671']]  
---------END OF CSV DATA-------------  
----------Printing Mappings from the zip of facets, categories, property ----------  
[('header', 'processing', 'REPORT ID'), ('header', 'processing', 'NEXT REPORT ID'), ('header', 'processing', 'initial-event-date'), ('header', 'processing', 'number'), ('informational', 'short_title', 'title'), ('header', 'file_num', 'summary-docket-num'), ('informational', 'short_narrative', 'description-story')]  
---------END OF List Comprehension-------------  
----------Printing the xml skeleton that will contain the mappings and the csv data ----------  

    <?xml version='1.0' encoding='UTF-8'?>
    <co_ehs>
      <object>
        <event>
          <facets>
            <categories>
              <instance>
                <property/>
              </instance>
            </categories>
          </facets>
        </event>
      </object>
</co_ehs>

---------END OF XML Skeleton-------------  
----------CSV DATA------------------  
C_ID,NEXT_C_ID,C_DATE,C_NUMBER,C_EVENT,C_DOCKETNUM,C_DESCRIPTION  
1,-1,12-Dec-04,545671,Vehicle Collision,786689,"No fault collision due to ice"  
3,4,15-Dec-04,588456,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns"  
4,-1,17-Dec-04,58871,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns"  
1000,-1,12-Nov-05,545671,Back Injury,9854231,"Lifting without a support device"  
55555,-1,12-Jan-06,545671,Foot Injury,7936547,"Office injury - heavy item dropped on foot"  

-----------What I want the xml output to look like----------------------  
    <?xml version="1.0" encoding="UTF-8"?>
    <co_ehs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="co_ehs.xsd">  
      <object id="3" object-type="ehs_report">
        <event event-tag="0">
          <facets name="header">
            <categories name="processing">
              <instance instance-tag="0">
                <property name="REPORT ID" value="1"/>
                <property name="NEXT REPORT ID" value="-1"/>
                <property name="initial-event-date" value="12-Dec-04"/>
                <property name="number" value="545671"/>
              </instance>
            </categories>
          </facets>
          <facets name="informational">
            <categories name="short_title">
              <instance-tag="0">
                <property name="title" value="Vehicle Collision"/>
              </instance>
            </categories>
          </facets>
          <facets name="header">
            <categories name="file_num">
              <instance-tag="0">
                <property name="summary-docket-num" value="786689"/>
              </instance>
            </categories>
          </facets>
          <facets name="informational">
            <categories name="short_narrative">
              <instance-tag="0">
                <property name="description-story" value="No fault collision due to ice"/>
              </instance>
            </categories>
          </facets>
        </event>
      </object>
    </co_ehs>

I need a little help on finding a tutorial or sample on taking a list comprehension and merging that with a data file from csv and turning all that into an xml file. From reading various python books & pdfs like ditp,IYOCGwP, learnpythonthe hardway,, lxml tut, think python and online searches I am most of the way there or so I think. I just need a push on tying everything together. I am basically taking an excel spreadsheet which I am exporting as a csv file. The csv contains rows of records which I need to map into an xml file. I am new to Python and thought I would use my little project to learn the language. The code listed is not pretty but works. I can read in a csv file and dump that into a list. I can combine 3 lists and output the resulting list and I can get my program to spit out a skeleton xml that is almost laid out in the format that I need. I will list my actual output of a small sample and what I am trying to accomplish with the xml below this code. Sorry if this is too lengthy, this is my first post.

import csv, datetime, os  
from lxml import etree  
from ElementTree_pretty import prettify

f = os.path.getsize("SO.csv")
fh = "SO.csv"
rh = open(fh, "rU")

rows = 0
try:
    rlist = csv.reader(rh)
    reports = []
    for row in rlist:
        '''print row.items()'''
        rowStripped = [x.strip(' ') for x in row]
        reports.append(rowStripped)
        rows +=1
except csv.Error, e:
    sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))

finally:
    rh.close()

root = etree.Element("co_ehs")
object = etree.SubElement(root, "object")
event = etree.SubElement(object, "event")
facets = etree.SubElement(event, "facets")
categories = etree.SubElement(facets, "categories")
instance = etree.SubElement(categories, "instance")
property = etree.SubElement(instance, "property")

facets = ['header','header','header','header','informational','header','informational']

categories =     ['processing','processing','processing','processing','short_title','file_num','short_narrative']

property = ['REPORT ID','NEXT REPORT ID','initial-event-date','number','title','summary-docket-num','description-story']

print('----------Printing Reports from CSV Data----------')
print reports
print('---------END OF CSV DATA-------------')
print
mappings = zip(facets, categories, property)
print('----------Printing Mappings from the zip of facets, categories, property ----------')
print mappings
print('---------END OF List Comprehension-------------')
print
print('----------Printing the xml skeleton that will contain the mappings and the csv data ----------')
print(etree.tostring(root, xml_declaration=True, encoding='UTF-8', pretty_print=True))
print('---------END OF XML Skeleton-------------')  

----My OUTPUT---  
----------Printing Reports from CSV Data----------  
[['1', '12-Dec-04', 'Vehicle Collision', '786689', 'No fault collision due to ice', '-1', '545671'], ['3', '15-Dec-04', 'OJT Injury', '87362', 'Paint fumes combusted causing 2nd degree burns', '4', '588456'], ['4', '17-Dec-04', 'OJT Injury', '87362', 'Paint fumes combusted causing 2nd degree burns', '-1', '58871'], ['1000', '12-Nov-05', 'Back Injury', '9854231', 'Lifting without a support device', '-1', '545671'], ['55555', '12-Jan-06', 'Foot Injury', '7936547', 'Office injury - heavy item dropped on foot', '-1', '545671']]  
---------END OF CSV DATA-------------  
----------Printing Mappings from the zip of facets, categories, property ----------  
[('header', 'processing', 'REPORT ID'), ('header', 'processing', 'NEXT REPORT ID'), ('header', 'processing', 'initial-event-date'), ('header', 'processing', 'number'), ('informational', 'short_title', 'title'), ('header', 'file_num', 'summary-docket-num'), ('informational', 'short_narrative', 'description-story')]  
---------END OF List Comprehension-------------  
----------Printing the xml skeleton that will contain the mappings and the csv data ----------  

    <?xml version='1.0' encoding='UTF-8'?>
    <co_ehs>
      <object>
        <event>
          <facets>
            <categories>
              <instance>
                <property/>
              </instance>
            </categories>
          </facets>
        </event>
      </object>
</co_ehs>

---------END OF XML Skeleton-------------  
----------CSV DATA------------------  
C_ID,NEXT_C_ID,C_DATE,C_NUMBER,C_EVENT,C_DOCKETNUM,C_DESCRIPTION  
1,-1,12-Dec-04,545671,Vehicle Collision,786689,"No fault collision due to ice"  
3,4,15-Dec-04,588456,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns"  
4,-1,17-Dec-04,58871,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns"  
1000,-1,12-Nov-05,545671,Back Injury,9854231,"Lifting without a support device"  
55555,-1,12-Jan-06,545671,Foot Injury,7936547,"Office injury - heavy item dropped on foot"  

-----------What I want the xml output to look like----------------------  
    <?xml version="1.0" encoding="UTF-8"?>
    <co_ehs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="co_ehs.xsd">  
      <object id="3" object-type="ehs_report">
        <event event-tag="0">
          <facets name="header">
            <categories name="processing">
              <instance instance-tag="0">
                <property name="REPORT ID" value="1"/>
                <property name="NEXT REPORT ID" value="-1"/>
                <property name="initial-event-date" value="12-Dec-04"/>
                <property name="number" value="545671"/>
              </instance>
            </categories>
          </facets>
          <facets name="informational">
            <categories name="short_title">
              <instance-tag="0">
                <property name="title" value="Vehicle Collision"/>
              </instance>
            </categories>
          </facets>
          <facets name="header">
            <categories name="file_num">
              <instance-tag="0">
                <property name="summary-docket-num" value="786689"/>
              </instance>
            </categories>
          </facets>
          <facets name="informational">
            <categories name="short_narrative">
              <instance-tag="0">
                <property name="description-story" value="No fault collision due to ice"/>
              </instance>
            </categories>
          </facets>
        </event>
      </object>
    </co_ehs>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

最终幸福 2024-10-24 17:19:14

这是我的解决方案。我使用 lxml,因为通常使用框架生成 XML 比使用字符串或模板文件更好。

缺少 co_ehs 的属性,但这可以通过一些 set() 调用轻松修复。我把这件事留给你来做。

顺便说一句:您可以通过单击答案左侧的复选标记来接受最佳答案

import csv, datetime, os  
from lxml import etree

def makeFacet(event, newheaders, ev, facetname, catname, count, nhposstart, nhposend):
    facets = etree.SubElement(event, "facets", name=facetname)
    categories = etree.SubElement(facets, "categories", name=catname)
    instance = etree.SubElement(categories, "instance") 
    instance.set("instance-tag", count)

    for i in range(nhposstart, nhposend):
        property = etree.SubElement(instance, "property")
        property.set("name", newheaders[i])
        property.set("value", ev[i].strip())


# read the csv
fh = "SO.csv"
rh = open(fh, "rU")

try:
    rlist = list(csv.reader(rh))
except csv.Error as e:
    sys.exit("file %s, line %d: %s" % (filename, reader.line_num, e))
finally:
    rh.close()

# generate the xml

# newheaders is a mapping of the csv column names, because they don't correspondent w/ the XML
newheaders = ["REPORT_ID","NEXT_REPORT_ID","initial-event-date","number","title","summary-docket-num", "description-story"]

root = etree.Element("co_ehs")

object = etree.SubElement(root, "object")

object.set("id", "3") # Not sure about this one
object.set("object-type", "ehs-report")

for c, ev in enumerate(rlist[1:]):
    event  = etree.SubElement(object, "event")
    event.set("event-tag", "%s"%c) 
    makeFacet(event, newheaders, ev, "header", "processing", "%s"%c, 0, 4)
    makeFacet(event, newheaders, ev, "informational", "short-title", "%s"%c, 4, 5)
    makeFacet(event, newheaders, ev, "header", "file_num", "%s"%c, 5, 6)
    makeFacet(event, newheaders, ev, "informational", "short_narrative", "%s"%c, 6, 7)

print(etree.tostring(root, xml_declaration=True, encoding="UTF-8", pretty_print=True))

Here is my solution. I use lxml, because it's normally better to generate XML with a framework than with strings or a template file.

The attributes of co_ehs are missing, but this could easily be fixed with some set()-calls. I leave it up to you to do this.

BTW: You can accept the best answer by clicking on the check mark on the left side of the answer

import csv, datetime, os  
from lxml import etree

def makeFacet(event, newheaders, ev, facetname, catname, count, nhposstart, nhposend):
    facets = etree.SubElement(event, "facets", name=facetname)
    categories = etree.SubElement(facets, "categories", name=catname)
    instance = etree.SubElement(categories, "instance") 
    instance.set("instance-tag", count)

    for i in range(nhposstart, nhposend):
        property = etree.SubElement(instance, "property")
        property.set("name", newheaders[i])
        property.set("value", ev[i].strip())


# read the csv
fh = "SO.csv"
rh = open(fh, "rU")

try:
    rlist = list(csv.reader(rh))
except csv.Error as e:
    sys.exit("file %s, line %d: %s" % (filename, reader.line_num, e))
finally:
    rh.close()

# generate the xml

# newheaders is a mapping of the csv column names, because they don't correspondent w/ the XML
newheaders = ["REPORT_ID","NEXT_REPORT_ID","initial-event-date","number","title","summary-docket-num", "description-story"]

root = etree.Element("co_ehs")

object = etree.SubElement(root, "object")

object.set("id", "3") # Not sure about this one
object.set("object-type", "ehs-report")

for c, ev in enumerate(rlist[1:]):
    event  = etree.SubElement(object, "event")
    event.set("event-tag", "%s"%c) 
    makeFacet(event, newheaders, ev, "header", "processing", "%s"%c, 0, 4)
    makeFacet(event, newheaders, ev, "informational", "short-title", "%s"%c, 4, 5)
    makeFacet(event, newheaders, ev, "header", "file_num", "%s"%c, 5, 6)
    makeFacet(event, newheaders, ev, "informational", "short_narrative", "%s"%c, 6, 7)

print(etree.tostring(root, xml_declaration=True, encoding="UTF-8", pretty_print=True))
罪歌 2024-10-24 17:19:14

我创建了一个名为 'pattern.txt' 的文件和以下内容(带有此缩进)。

请注意 8 %s 放在重要位置。

        <event event-tag="%s">
          <facets name="header">
            <categories name="processing">
              <instance instance-tag="0">
                <property name="REPORT ID" value="%s"/>
                <property name="NEXT REPORT ID" value="%s"/>
                <property name="initial-event-date" value="%s"/>
                <property name="number" value="%s"/>
              </instance>
            </categories>
          </facets>
          <facets name="informational">
            <categories name="short_title">
              <instance-tag="0">
                <property name="title" value="%s"/>
              </instance>
            </categories>
          </facets>
          <facets name="header">
            <categories name="file_num">
              <instance-tag="0">
                <property name="summary-docket-num" value="%s"/>
              </instance>
            </categories>
          </facets>
          <facets name="informational">
            <categories name="short_narrative">
              <instance-tag="0">
                <property name="description-story" value="%s"/>
              </instance>
            </categories>
          </facets>
        </event>

我创建了包含以下内容的文件 'SO.csv'

C_ID,NEXT_C_ID,C_DATE,C_NUMBER,C_EVENT,C_DOCKETNUM,C_DESCRIPTION  
1,-1,12-Dec-04,545671,Vehicle Collision,786689,"No fault collision due to ice"  
3,4,15-Dec-04,588456,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns"  
4,-1,17-Dec-04,58871,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns"  
1000,-1,12-Nov-05,545671,Back Injury,9854231,"Lifting without a support device"  
55555,-1,12-Jan-06,545671,Foot Injury,7936547,"Office injury - heavy item dropped on foot"

我运行了以下代码:

import csv

rid = csv.reader(open('SO.csv','rb'))
rid.next()

with open('pattern.txt') as f:
    pati = f.read()

xmloutput = ['    <?xml version="1.0" encoding="UTF-8"?>',
             '    <co_ehs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" '\
             'xsi:noNamespaceSchemaLocation="co_ehs.xsd">',
             '      <object id="3" object-type="ehs_report">']

for i,row in enumerate(rid):
    row[0:0] = str(i)
    xmloutput.append( pati % tuple(row) )

print '\n'.join(xmloutput)

这对您有帮助吗?

I created a file with name 'pattern.txt' and following content (with this indentation).

Notice the 8 %s put at strategic places.

        <event event-tag="%s">
          <facets name="header">
            <categories name="processing">
              <instance instance-tag="0">
                <property name="REPORT ID" value="%s"/>
                <property name="NEXT REPORT ID" value="%s"/>
                <property name="initial-event-date" value="%s"/>
                <property name="number" value="%s"/>
              </instance>
            </categories>
          </facets>
          <facets name="informational">
            <categories name="short_title">
              <instance-tag="0">
                <property name="title" value="%s"/>
              </instance>
            </categories>
          </facets>
          <facets name="header">
            <categories name="file_num">
              <instance-tag="0">
                <property name="summary-docket-num" value="%s"/>
              </instance>
            </categories>
          </facets>
          <facets name="informational">
            <categories name="short_narrative">
              <instance-tag="0">
                <property name="description-story" value="%s"/>
              </instance>
            </categories>
          </facets>
        </event>

I created file 'SO.csv' with folowing content:

C_ID,NEXT_C_ID,C_DATE,C_NUMBER,C_EVENT,C_DOCKETNUM,C_DESCRIPTION  
1,-1,12-Dec-04,545671,Vehicle Collision,786689,"No fault collision due to ice"  
3,4,15-Dec-04,588456,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns"  
4,-1,17-Dec-04,58871,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns"  
1000,-1,12-Nov-05,545671,Back Injury,9854231,"Lifting without a support device"  
55555,-1,12-Jan-06,545671,Foot Injury,7936547,"Office injury - heavy item dropped on foot"

And I ran the following code:

import csv

rid = csv.reader(open('SO.csv','rb'))
rid.next()

with open('pattern.txt') as f:
    pati = f.read()

xmloutput = ['    <?xml version="1.0" encoding="UTF-8"?>',
             '    <co_ehs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" '\
             'xsi:noNamespaceSchemaLocation="co_ehs.xsd">',
             '      <object id="3" object-type="ehs_report">']

for i,row in enumerate(rid):
    row[0:0] = str(i)
    xmloutput.append( pati % tuple(row) )

print '\n'.join(xmloutput)

Does this help you ?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文