使用Python使用复杂/高级映射将Excel转换为XML

发布于 2025-01-27 10:53:11 字数 8027 浏览 1 评论 0原文

我是编码世界的新手，现在已经存在了相当长的一段时间。

我想将Excel文件转换为XML以进行某些分析。但是，XML文件具有复杂的架构，我无法复制示例XML。

我尝试使用Python和其中一些库，但我无法这样做。

这是 xlsx 文件中的数据：

XML输出如下：

<REP019-GIValueMeasuresReport
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xmlns="urn:fsa-gov-uk:MER:REP019:1"
     xsi:schemaLocation="urn:XXXXXXX:REP019:1 
                         http://XXXXXXXXX/REP019/v1/REP019-Schema.xsd"
                
            currency = "GBP"
            units=     "single">
                
                
       <Declaration>
                  <Return>no</Return>
       </Declaration>
       <ReturnDetails>
                    <Consent>yes</Consent>
                    <Informed>yes</Informed>
       </ReturnDetails>
       <ProductReport>
                  <Product>After the event legal expenses (All)</Product>
                   <FiveLargest>Intermediary Z</FiveLargest>
                   <FiveLargest>Intermediary X</FiveLargest>
                   <FiveLargest>Intermediary Y</FiveLargest>
                   <FiveLargest>Brand A</FiveLargest>
                   <FiveLargest>Brand B</FiveLargest>
                   <Sales>5000</Sales>
                   <CostToState>500000</CostToState>
                   <NumberofClaims>375</NumberofClaims>
                   <Average>6000.00</Average>
                   <Frequency>6.25</Frequency>
                   <NoofParts>320</NoofParts>
                   <Nulled>25</Nulled>
                   <Accepted>93.33</Accepted>
                   <Paid>32000</Paid>
                   <Avg_C>100</Avg_C>
                   <Highest>1200</Highest>
                   <MEH>25</MEH>
                   <Complaints>5</Complaints>
                   <Percentage>1.33</Percentage>
        </ProductReport>
        <ProductReport>
                  <Product>Alloy Wheel insurance (Add-on)</Product>
                   <FiveLargest>Intermediary Z</FiveLargest>
                   <FiveLargest>Intermediary X</FiveLargest>
                   <FiveLargest>Brand A</FiveLargest>              
                   <Sales>17000</Sales>
                   <CostToState>765000</CostToState>
                   <NumberofClaims>340</NumberofClaims>
                   <Average>16400.00</Average>
                   <Frequency>2.07</Frequency>
                   <NoofParts>280</NoofParts>
                   <Nulled>78</Nulled>
                   <Accepted>77.06</Accepted>
                   <Paid>39200</Paid>
                   <Avg_C>140</Avg_C>
                   <Highest>2000</Highest>
                   <MEH>50</MEH>
                   <Complaints>20</Complaints>
                   <Percentage>5.88</Percentage>
                    </ProductReport>
        <ProductReport>
                  <Product>Alloy Wheel insurance (Stand-alone)</Product>
                   <FiveLargest>Brand A</FiveLargest>             
                   <Sales>100000</Sales>
                   <CostToState>7000000</CostToState>
                   <NumberofClaims>3500</NumberofClaims>
                   <Average>87000.00</Average>
                   <Frequency>4.02</Frequency>
                   <NoofParts>3500</NoofParts>
                   <Nulled>100</Nulled>
                   <Accepted>97.14</Accepted>
                   <Paid>2275000</Paid>
                   <Avg_C>650</Avg_C>
                   <Highest>10500</Highest>
                   <MEH>325</MEH>
                   <Complaints>170</Complaints>
                   <Percentage>4.86</Percentage>
        </ProductReport>
        <ProductReport>
                   <Product>Before the event legal expenses - home (All)</Product>
                   <FiveLargest>Intermediary Z</FiveLargest>
                   <FiveLargest>Intermediary X</FiveLargest>
                   <FiveLargest>Intermediary Y</FiveLargest>
                   <FiveLargest>Brand A</FiveLargest>
                   <FiveLargest>Brand B</FiveLargest>
                   <Sales>5000</Sales>
                   <CostToState>500000</CostToState>
                   <NumberofClaims>375</NumberofClaims>
                   <Average>6000.00</Average>
                   <Frequency>6.25</Frequency>
                   <NoofParts>320</NoofParts>
                   <Nulled>25</Nulled>
                   <Accepted>93.33</Accepted>
                   <Paid>32000</Paid>
                   <Avg_C>100</Avg_C>
                   <Highest>1200</Highest>
                   <MEH>25</MEH>
                   <Complaints>5</Complaints>
                   <Percentage>1.33</Percentage>
        </ProductReport>
        <ProductReport>
                   <Product>Before the event legal expenses - motor (All)</Product>
                   <FiveLargest>Intermediary Z</FiveLargest>
                   <FiveLargest>Intermediary X</FiveLargest>
                   <FiveLargest>Brand A</FiveLargest>              
                   <Sales>17000</Sales>
                   <CostToState>765000</CostToState>
                   <NumberofClaims>340</NumberofClaims>
                   <Average>16400.00</Average>
                   <Frequency>2.07</Frequency>
                   <NoofParts>280</NoofParts>
                   <Nulled>78</Nulled>
                   <Accepted>77.06</Accepted>
                   <Paid>39200</Paid>
                   <Avg_C>140</Avg_C>
                   <Highest>2000</Highest>
                   <MEH>50</MEH>
                   <Complaints>20</Complaints>
                   <Percentage>5.88</Percentage>
        </ProductReport>
        <ProductReport>
                   <Product>Before the event legal expenses - other (All)</Product>
                   <FiveLargest>Brand A</FiveLargest>             
                   <Sales>100000</Sales>
                   <CostToState>7000000</CostToState>
                   <NumberofClaims>3500</NumberofClaims>
                   <Average>87000.00</Average>
                   <Frequency>4.02</Frequency>
                   <NoofParts>3500</NoofParts>
                   <Nulled>100</Nulled>
                   <Accepted>97.14</Accepted>
                   <Paid>2275000</Paid>
                   <Avg_C>650</Avg_C>
                   <Highest>10500</Highest>
                   <MEH>325</MEH>
                   <Complaints>170</Complaints>
                   <Percentage>4.86</Percentage>
        </ProductReport>
    
</REP019-GIValueMeasuresReport>

原文

I am new to the coding world, and I am stuck on an issue for quite some time now.

I want to convert an excel file to an XML for certain analysis. The XML file however has a complex schema, and I am unable to replicate the Sample XML.

I have tried using Python with some of it's libraries, but I am unable to do so.

This is the data in XLSX file:

Excel Data

and the expected XML output is as follows:

<REP019-GIValueMeasuresReport
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xmlns="urn:fsa-gov-uk:MER:REP019:1"
     xsi:schemaLocation="urn:XXXXXXX:REP019:1 
                         http://XXXXXXXXX/REP019/v1/REP019-Schema.xsd"
                
            currency = "GBP"
            units=     "single">
                
                
       <Declaration>
                  <Return>no</Return>
       </Declaration>
       <ReturnDetails>
                    <Consent>yes</Consent>
                    <Informed>yes</Informed>
       </ReturnDetails>
       <ProductReport>
                  <Product>After the event legal expenses (All)</Product>
                   <FiveLargest>Intermediary Z</FiveLargest>
                   <FiveLargest>Intermediary X</FiveLargest>
                   <FiveLargest>Intermediary Y</FiveLargest>
                   <FiveLargest>Brand A</FiveLargest>
                   <FiveLargest>Brand B</FiveLargest>
                   <Sales>5000</Sales>
                   <CostToState>500000</CostToState>
                   <NumberofClaims>375</NumberofClaims>
                   <Average>6000.00</Average>
                   <Frequency>6.25</Frequency>
                   <NoofParts>320</NoofParts>
                   <Nulled>25</Nulled>
                   <Accepted>93.33</Accepted>
                   <Paid>32000</Paid>
                   <Avg_C>100</Avg_C>
                   <Highest>1200</Highest>
                   <MEH>25</MEH>
                   <Complaints>5</Complaints>
                   <Percentage>1.33</Percentage>
        </ProductReport>
        <ProductReport>
                  <Product>Alloy Wheel insurance (Add-on)</Product>
                   <FiveLargest>Intermediary Z</FiveLargest>
                   <FiveLargest>Intermediary X</FiveLargest>
                   <FiveLargest>Brand A</FiveLargest>              
                   <Sales>17000</Sales>
                   <CostToState>765000</CostToState>
                   <NumberofClaims>340</NumberofClaims>
                   <Average>16400.00</Average>
                   <Frequency>2.07</Frequency>
                   <NoofParts>280</NoofParts>
                   <Nulled>78</Nulled>
                   <Accepted>77.06</Accepted>
                   <Paid>39200</Paid>
                   <Avg_C>140</Avg_C>
                   <Highest>2000</Highest>
                   <MEH>50</MEH>
                   <Complaints>20</Complaints>
                   <Percentage>5.88</Percentage>
                    </ProductReport>
        <ProductReport>
                  <Product>Alloy Wheel insurance (Stand-alone)</Product>
                   <FiveLargest>Brand A</FiveLargest>             
                   <Sales>100000</Sales>
                   <CostToState>7000000</CostToState>
                   <NumberofClaims>3500</NumberofClaims>
                   <Average>87000.00</Average>
                   <Frequency>4.02</Frequency>
                   <NoofParts>3500</NoofParts>
                   <Nulled>100</Nulled>
                   <Accepted>97.14</Accepted>
                   <Paid>2275000</Paid>
                   <Avg_C>650</Avg_C>
                   <Highest>10500</Highest>
                   <MEH>325</MEH>
                   <Complaints>170</Complaints>
                   <Percentage>4.86</Percentage>
        </ProductReport>
        <ProductReport>
                   <Product>Before the event legal expenses - home (All)</Product>
                   <FiveLargest>Intermediary Z</FiveLargest>
                   <FiveLargest>Intermediary X</FiveLargest>
                   <FiveLargest>Intermediary Y</FiveLargest>
                   <FiveLargest>Brand A</FiveLargest>
                   <FiveLargest>Brand B</FiveLargest>
                   <Sales>5000</Sales>
                   <CostToState>500000</CostToState>
                   <NumberofClaims>375</NumberofClaims>
                   <Average>6000.00</Average>
                   <Frequency>6.25</Frequency>
                   <NoofParts>320</NoofParts>
                   <Nulled>25</Nulled>
                   <Accepted>93.33</Accepted>
                   <Paid>32000</Paid>
                   <Avg_C>100</Avg_C>
                   <Highest>1200</Highest>
                   <MEH>25</MEH>
                   <Complaints>5</Complaints>
                   <Percentage>1.33</Percentage>
        </ProductReport>
        <ProductReport>
                   <Product>Before the event legal expenses - motor (All)</Product>
                   <FiveLargest>Intermediary Z</FiveLargest>
                   <FiveLargest>Intermediary X</FiveLargest>
                   <FiveLargest>Brand A</FiveLargest>              
                   <Sales>17000</Sales>
                   <CostToState>765000</CostToState>
                   <NumberofClaims>340</NumberofClaims>
                   <Average>16400.00</Average>
                   <Frequency>2.07</Frequency>
                   <NoofParts>280</NoofParts>
                   <Nulled>78</Nulled>
                   <Accepted>77.06</Accepted>
                   <Paid>39200</Paid>
                   <Avg_C>140</Avg_C>
                   <Highest>2000</Highest>
                   <MEH>50</MEH>
                   <Complaints>20</Complaints>
                   <Percentage>5.88</Percentage>
        </ProductReport>
        <ProductReport>
                   <Product>Before the event legal expenses - other (All)</Product>
                   <FiveLargest>Brand A</FiveLargest>             
                   <Sales>100000</Sales>
                   <CostToState>7000000</CostToState>
                   <NumberofClaims>3500</NumberofClaims>
                   <Average>87000.00</Average>
                   <Frequency>4.02</Frequency>
                   <NoofParts>3500</NoofParts>
                   <Nulled>100</Nulled>
                   <Accepted>97.14</Accepted>
                   <Paid>2275000</Paid>
                   <Avg_C>650</Avg_C>
                   <Highest>10500</Highest>
                   <MEH>325</MEH>
                   <Complaints>170</Complaints>
                   <Percentage>4.86</Percentage>
        </ProductReport>
    
</REP019-GIValueMeasuresReport>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

葬シ愛 2025-02-03 10:53:11

应该足够容易，请阅读像Geekforgeeks这样的教程，以进行这种转换，
https://www.geeksforgeeks.org/how-to-conter-convert-excel-excel-excel-to-xml-format-in-format-in-python/prbr >

自货币，单位和＆lt;声明＆gt;和＆lt; returnDetails＆gt;标签在XML文件中都是常见的，并且在电子表格中都是相同的，我认为可以从任何行中获取这些

currency = ws['A2'].value
units = ws['B2'].value

标签'xml_schema ='文本以及其余的模式文本，按照您的要求进行格式。＆lt;声明＆gt;和＆lt; returnDetails＆gt;可以使用没有迭代的“ with with tag”语法来实现，例如，lt; returndetails＆gt;。

with tag('ReturnDetails'):
        with tag("Consent"):
            text(ws['D2'].value)
        with tag("Informed"):
            text(ws['E2'].value)

对＆lt;声明＆gt;做同样的做法。指向正确的单元格值。
然后，对于每个行的其余的每个行都遵循教程中的示例进行迭代（从第6至21列），使用外部标签＆lt;和内部标签每个列标题。

with tag('ProductReport'):
    with tag("Product"):
        text(row[0])
    with tag("FiveLargest"):
        text(row[1])

我不确定的一部分是您在示例xml中显示的第一个＆lt; producttreport＆gt;具有5＆lt; fivelargest＆gt;标签，第二个具有3等。不知道如何确定这一点，因为每行似乎只有一个条目，对于此列，这些单元格中是否有多条线？隐藏在屏幕截图中的

           <FiveLargest>Intermediary Z</FiveLargest>
           <FiveLargest>Intermediary X</FiveLargest>
           <FiveLargest>Intermediary Y</FiveLargest>
           <FiveLargest>Brand A</FiveLargest>
           <FiveLargest>Brand B</FiveLargest>

情况下，如果是这种情况，则需要读取单元格内容，在新行上拆分并迭代相同标签名称的列表。

-
-----------------添加代码示例------------------
以下是如何编码此内容的示例。如果＆lt; fivelargest＆gt;多层次，此代码将对此进行管理。

from openpyxl import load_workbook
from yattag import Doc, indent


wb = load_workbook("afegI.xlsx")
ws = wb["afegI"]
currency = ws['A2'].value
units = ws['B2'].value

for row in ws.iter_rows(min_row=1, max_row=2, min_col=1, max_col=6):
    print([cell.value for cell in row])

doc, tag, text = Doc().tagtext()

xml_schema = """
<REP019-GIValueMeasuresReport
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="urn:fsa-gov-uk:MER:REP019:1"
    xsi:schemaLocation="urn:XXXXXXX:REP019:1
                        http://XXXXXXXXX/REP019/v1/REP019-Schema.xsd"

             currency = "{0}"
             units="{1}">
"""
xml_schema_close = '</REP019-GIValueMeasuresReport>'
doc.asis(xml_schema)
with tag('Declaration'):
        with tag("Return"): text(ws['C2'].value)

with tag('ReturnDetails'):
        with tag("Consent"):
            text(ws['D2'].value)
        with tag("Informed"):
            text(ws['E2'].value)

for row in ws.iter_rows(min_row=2, max_row=10, min_col=6, max_col=21):
    row = [cell.value for cell in row]
    with tag('ProductReport'):
        with tag("Product"):
            text(row[0])
        multi_line = row[1].split('\n')
        for line in multi_line:
            with tag("FiveLargest"):
                text(line)
        with tag("Sales"):
            text(row[2])
        with tag("CostToState"):
            text(row[3])
        with tag("NumberofClaims"):
            text(row[4])
        with tag("Average"):
            text(row[5])
        with tag("Frequency"):
            text(row[6])
        with tag("NoofParts"):
            text(row[7])
        with tag("Nulled"):
            text(row[8])
        with tag("Accepted"):
            text(row[9])
        with tag("Paid"):
            text(row[10])
        with tag("Avg_C"):
            text(row[11])
        with tag("Highest"):
            text(row[12])
        with tag("MEH"):
            text(row[13])
        with tag("Complaints"):
            text(row[14])
        with tag("Percentage"):
            text(row[15])

doc.asis(xml_schema_close)
result = indent(
    doc.getvalue().format(currency, units),
    indentation='   ',
    indent_text=True
)

with open("output.xml", "w") as f:
    f.write(result)

Should be easy enough, read a tutorial like the GeekforGeeks on doing such a conversion,
https://www.geeksforgeeks.org/how-to-convert-excel-to-xml-format-in-python/

Since currency, units and the <Declaration> and <ReturnDetails> tags are common to all in the xml file and are all the same in the spreadsheet, I presume these can be taken from any row so no need to iterate for those just select a cell to use, like

currency = ws['A2'].value
units = ws['B2'].value

Currency and units can be added to the 'xml_schema =' text along with the rest of your schema text formatted as you like. <Declaration> and <ReturnDetails> can be achieved using the 'with tag' syntax without the iterations, e.g. this for <ReturnDetails>.

with tag('ReturnDetails'):
        with tag("Consent"):
            text(ws['D2'].value)
        with tag("Informed"):
            text(ws['E2'].value)

Do same for <Declaration> pointing to the correct cell value.
Then for each of the rest of the rows follow the example in the tutorial to iterate (from columns 6 to 21) using outer tag <ProductReport> and inner tags per the column headers.

with tag('ProductReport'):
    with tag("Product"):
        text(row[0])
    with tag("FiveLargest"):
        text(row[1])

The part I'm not sure about is you show in the example xml the first <ProductReport> with 5 <FiveLargest> tags, the second has 3 etc. Do not know how this is determined as each row appears to have only one entry for this column, are there are multiple lines in these cells? hidden in the screen shot

           <FiveLargest>Intermediary Z</FiveLargest>
           <FiveLargest>Intermediary X</FiveLargest>
           <FiveLargest>Intermediary Y</FiveLargest>
           <FiveLargest>Brand A</FiveLargest>
           <FiveLargest>Brand B</FiveLargest>

If that is the case you'll need read the cell contents, split on the new line and iterate the list for the same tag name.

-
-----------------Add Code example---------------
Below is an example of how to code this. If the <FiveLargest> are multilined this code will manage that.

from openpyxl import load_workbook
from yattag import Doc, indent


wb = load_workbook("afegI.xlsx")
ws = wb["afegI"]
currency = ws['A2'].value
units = ws['B2'].value

for row in ws.iter_rows(min_row=1, max_row=2, min_col=1, max_col=6):
    print([cell.value for cell in row])

doc, tag, text = Doc().tagtext()

xml_schema = """
<REP019-GIValueMeasuresReport
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="urn:fsa-gov-uk:MER:REP019:1"
    xsi:schemaLocation="urn:XXXXXXX:REP019:1
                        http://XXXXXXXXX/REP019/v1/REP019-Schema.xsd"

             currency = "{0}"
             units="{1}">
"""
xml_schema_close = '</REP019-GIValueMeasuresReport>'
doc.asis(xml_schema)
with tag('Declaration'):
        with tag("Return"): text(ws['C2'].value)

with tag('ReturnDetails'):
        with tag("Consent"):
            text(ws['D2'].value)
        with tag("Informed"):
            text(ws['E2'].value)

for row in ws.iter_rows(min_row=2, max_row=10, min_col=6, max_col=21):
    row = [cell.value for cell in row]
    with tag('ProductReport'):
        with tag("Product"):
            text(row[0])
        multi_line = row[1].split('\n')
        for line in multi_line:
            with tag("FiveLargest"):
                text(line)
        with tag("Sales"):
            text(row[2])
        with tag("CostToState"):
            text(row[3])
        with tag("NumberofClaims"):
            text(row[4])
        with tag("Average"):
            text(row[5])
        with tag("Frequency"):
            text(row[6])
        with tag("NoofParts"):
            text(row[7])
        with tag("Nulled"):
            text(row[8])
        with tag("Accepted"):
            text(row[9])
        with tag("Paid"):
            text(row[10])
        with tag("Avg_C"):
            text(row[11])
        with tag("Highest"):
            text(row[12])
        with tag("MEH"):
            text(row[13])
        with tag("Complaints"):
            text(row[14])
        with tag("Percentage"):
            text(row[15])

doc.asis(xml_schema_close)
result = indent(
    doc.getvalue().format(currency, units),
    indentation='   ',
    indent_text=True
)

with open("output.xml", "w") as f:
    f.write(result)

回复收藏 0 原文

~没有更多了~