使用Python使用复杂/高级映射将Excel转换为XML
我是编码世界的新手,现在已经存在了相当长的一段时间。
我想将Excel文件转换为XML以进行某些分析。但是,XML文件具有复杂的架构,我无法复制示例XML。
我尝试使用Python和其中一些库,但我无法这样做。
这是 xlsx 文件中的数据:
XML输出如下:
<REP019-GIValueMeasuresReport
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="urn:fsa-gov-uk:MER:REP019:1"
xsi:schemaLocation="urn:XXXXXXX:REP019:1
http://XXXXXXXXX/REP019/v1/REP019-Schema.xsd"
currency = "GBP"
units= "single">
<Declaration>
<Return>no</Return>
</Declaration>
<ReturnDetails>
<Consent>yes</Consent>
<Informed>yes</Informed>
</ReturnDetails>
<ProductReport>
<Product>After the event legal expenses (All)</Product>
<FiveLargest>Intermediary Z</FiveLargest>
<FiveLargest>Intermediary X</FiveLargest>
<FiveLargest>Intermediary Y</FiveLargest>
<FiveLargest>Brand A</FiveLargest>
<FiveLargest>Brand B</FiveLargest>
<Sales>5000</Sales>
<CostToState>500000</CostToState>
<NumberofClaims>375</NumberofClaims>
<Average>6000.00</Average>
<Frequency>6.25</Frequency>
<NoofParts>320</NoofParts>
<Nulled>25</Nulled>
<Accepted>93.33</Accepted>
<Paid>32000</Paid>
<Avg_C>100</Avg_C>
<Highest>1200</Highest>
<MEH>25</MEH>
<Complaints>5</Complaints>
<Percentage>1.33</Percentage>
</ProductReport>
<ProductReport>
<Product>Alloy Wheel insurance (Add-on)</Product>
<FiveLargest>Intermediary Z</FiveLargest>
<FiveLargest>Intermediary X</FiveLargest>
<FiveLargest>Brand A</FiveLargest>
<Sales>17000</Sales>
<CostToState>765000</CostToState>
<NumberofClaims>340</NumberofClaims>
<Average>16400.00</Average>
<Frequency>2.07</Frequency>
<NoofParts>280</NoofParts>
<Nulled>78</Nulled>
<Accepted>77.06</Accepted>
<Paid>39200</Paid>
<Avg_C>140</Avg_C>
<Highest>2000</Highest>
<MEH>50</MEH>
<Complaints>20</Complaints>
<Percentage>5.88</Percentage>
</ProductReport>
<ProductReport>
<Product>Alloy Wheel insurance (Stand-alone)</Product>
<FiveLargest>Brand A</FiveLargest>
<Sales>100000</Sales>
<CostToState>7000000</CostToState>
<NumberofClaims>3500</NumberofClaims>
<Average>87000.00</Average>
<Frequency>4.02</Frequency>
<NoofParts>3500</NoofParts>
<Nulled>100</Nulled>
<Accepted>97.14</Accepted>
<Paid>2275000</Paid>
<Avg_C>650</Avg_C>
<Highest>10500</Highest>
<MEH>325</MEH>
<Complaints>170</Complaints>
<Percentage>4.86</Percentage>
</ProductReport>
<ProductReport>
<Product>Before the event legal expenses - home (All)</Product>
<FiveLargest>Intermediary Z</FiveLargest>
<FiveLargest>Intermediary X</FiveLargest>
<FiveLargest>Intermediary Y</FiveLargest>
<FiveLargest>Brand A</FiveLargest>
<FiveLargest>Brand B</FiveLargest>
<Sales>5000</Sales>
<CostToState>500000</CostToState>
<NumberofClaims>375</NumberofClaims>
<Average>6000.00</Average>
<Frequency>6.25</Frequency>
<NoofParts>320</NoofParts>
<Nulled>25</Nulled>
<Accepted>93.33</Accepted>
<Paid>32000</Paid>
<Avg_C>100</Avg_C>
<Highest>1200</Highest>
<MEH>25</MEH>
<Complaints>5</Complaints>
<Percentage>1.33</Percentage>
</ProductReport>
<ProductReport>
<Product>Before the event legal expenses - motor (All)</Product>
<FiveLargest>Intermediary Z</FiveLargest>
<FiveLargest>Intermediary X</FiveLargest>
<FiveLargest>Brand A</FiveLargest>
<Sales>17000</Sales>
<CostToState>765000</CostToState>
<NumberofClaims>340</NumberofClaims>
<Average>16400.00</Average>
<Frequency>2.07</Frequency>
<NoofParts>280</NoofParts>
<Nulled>78</Nulled>
<Accepted>77.06</Accepted>
<Paid>39200</Paid>
<Avg_C>140</Avg_C>
<Highest>2000</Highest>
<MEH>50</MEH>
<Complaints>20</Complaints>
<Percentage>5.88</Percentage>
</ProductReport>
<ProductReport>
<Product>Before the event legal expenses - other (All)</Product>
<FiveLargest>Brand A</FiveLargest>
<Sales>100000</Sales>
<CostToState>7000000</CostToState>
<NumberofClaims>3500</NumberofClaims>
<Average>87000.00</Average>
<Frequency>4.02</Frequency>
<NoofParts>3500</NoofParts>
<Nulled>100</Nulled>
<Accepted>97.14</Accepted>
<Paid>2275000</Paid>
<Avg_C>650</Avg_C>
<Highest>10500</Highest>
<MEH>325</MEH>
<Complaints>170</Complaints>
<Percentage>4.86</Percentage>
</ProductReport>
</REP019-GIValueMeasuresReport>
I am new to the coding world, and I am stuck on an issue for quite some time now.
I want to convert an excel file to an XML for certain analysis. The XML file however has a complex schema, and I am unable to replicate the Sample XML.
I have tried using Python with some of it's libraries, but I am unable to do so.
This is the data in XLSX file:
and the expected XML output is as follows:
<REP019-GIValueMeasuresReport
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="urn:fsa-gov-uk:MER:REP019:1"
xsi:schemaLocation="urn:XXXXXXX:REP019:1
http://XXXXXXXXX/REP019/v1/REP019-Schema.xsd"
currency = "GBP"
units= "single">
<Declaration>
<Return>no</Return>
</Declaration>
<ReturnDetails>
<Consent>yes</Consent>
<Informed>yes</Informed>
</ReturnDetails>
<ProductReport>
<Product>After the event legal expenses (All)</Product>
<FiveLargest>Intermediary Z</FiveLargest>
<FiveLargest>Intermediary X</FiveLargest>
<FiveLargest>Intermediary Y</FiveLargest>
<FiveLargest>Brand A</FiveLargest>
<FiveLargest>Brand B</FiveLargest>
<Sales>5000</Sales>
<CostToState>500000</CostToState>
<NumberofClaims>375</NumberofClaims>
<Average>6000.00</Average>
<Frequency>6.25</Frequency>
<NoofParts>320</NoofParts>
<Nulled>25</Nulled>
<Accepted>93.33</Accepted>
<Paid>32000</Paid>
<Avg_C>100</Avg_C>
<Highest>1200</Highest>
<MEH>25</MEH>
<Complaints>5</Complaints>
<Percentage>1.33</Percentage>
</ProductReport>
<ProductReport>
<Product>Alloy Wheel insurance (Add-on)</Product>
<FiveLargest>Intermediary Z</FiveLargest>
<FiveLargest>Intermediary X</FiveLargest>
<FiveLargest>Brand A</FiveLargest>
<Sales>17000</Sales>
<CostToState>765000</CostToState>
<NumberofClaims>340</NumberofClaims>
<Average>16400.00</Average>
<Frequency>2.07</Frequency>
<NoofParts>280</NoofParts>
<Nulled>78</Nulled>
<Accepted>77.06</Accepted>
<Paid>39200</Paid>
<Avg_C>140</Avg_C>
<Highest>2000</Highest>
<MEH>50</MEH>
<Complaints>20</Complaints>
<Percentage>5.88</Percentage>
</ProductReport>
<ProductReport>
<Product>Alloy Wheel insurance (Stand-alone)</Product>
<FiveLargest>Brand A</FiveLargest>
<Sales>100000</Sales>
<CostToState>7000000</CostToState>
<NumberofClaims>3500</NumberofClaims>
<Average>87000.00</Average>
<Frequency>4.02</Frequency>
<NoofParts>3500</NoofParts>
<Nulled>100</Nulled>
<Accepted>97.14</Accepted>
<Paid>2275000</Paid>
<Avg_C>650</Avg_C>
<Highest>10500</Highest>
<MEH>325</MEH>
<Complaints>170</Complaints>
<Percentage>4.86</Percentage>
</ProductReport>
<ProductReport>
<Product>Before the event legal expenses - home (All)</Product>
<FiveLargest>Intermediary Z</FiveLargest>
<FiveLargest>Intermediary X</FiveLargest>
<FiveLargest>Intermediary Y</FiveLargest>
<FiveLargest>Brand A</FiveLargest>
<FiveLargest>Brand B</FiveLargest>
<Sales>5000</Sales>
<CostToState>500000</CostToState>
<NumberofClaims>375</NumberofClaims>
<Average>6000.00</Average>
<Frequency>6.25</Frequency>
<NoofParts>320</NoofParts>
<Nulled>25</Nulled>
<Accepted>93.33</Accepted>
<Paid>32000</Paid>
<Avg_C>100</Avg_C>
<Highest>1200</Highest>
<MEH>25</MEH>
<Complaints>5</Complaints>
<Percentage>1.33</Percentage>
</ProductReport>
<ProductReport>
<Product>Before the event legal expenses - motor (All)</Product>
<FiveLargest>Intermediary Z</FiveLargest>
<FiveLargest>Intermediary X</FiveLargest>
<FiveLargest>Brand A</FiveLargest>
<Sales>17000</Sales>
<CostToState>765000</CostToState>
<NumberofClaims>340</NumberofClaims>
<Average>16400.00</Average>
<Frequency>2.07</Frequency>
<NoofParts>280</NoofParts>
<Nulled>78</Nulled>
<Accepted>77.06</Accepted>
<Paid>39200</Paid>
<Avg_C>140</Avg_C>
<Highest>2000</Highest>
<MEH>50</MEH>
<Complaints>20</Complaints>
<Percentage>5.88</Percentage>
</ProductReport>
<ProductReport>
<Product>Before the event legal expenses - other (All)</Product>
<FiveLargest>Brand A</FiveLargest>
<Sales>100000</Sales>
<CostToState>7000000</CostToState>
<NumberofClaims>3500</NumberofClaims>
<Average>87000.00</Average>
<Frequency>4.02</Frequency>
<NoofParts>3500</NoofParts>
<Nulled>100</Nulled>
<Accepted>97.14</Accepted>
<Paid>2275000</Paid>
<Avg_C>650</Avg_C>
<Highest>10500</Highest>
<MEH>325</MEH>
<Complaints>170</Complaints>
<Percentage>4.86</Percentage>
</ProductReport>
</REP019-GIValueMeasuresReport>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
应该足够容易,请阅读像Geekforgeeks这样的教程,以进行这种转换,
https://www.geeksforgeeks.org/how-to-conter-convert-excel-excel-excel-to-xml-format-in-format-in-python/prbr >
自货币,单位和&lt;声明&gt;和&lt; returnDetails&gt;标签在XML文件中都是常见的,并且在电子表格中都是相同的,我认为可以从任何行中获取这些
标签'xml_schema ='文本以及其余的模式文本,按照您的要求进行格式。 &lt;声明&gt;和&lt; returnDetails&gt;可以使用没有迭代的“ with with tag”语法来实现,例如,lt; returndetails&gt;。
对&lt;声明&gt;做同样的做法。指向正确的单元格值。
然后,对于每个行的其余的每个行都遵循教程中的示例进行迭代(从第6至21列),使用外部标签&lt;和内部标签每个列标题。
我不确定的一部分是您在示例xml中显示的第一个&lt; producttreport&gt;具有5&lt; fivelargest&gt;标签,第二个具有3等。不知道如何确定这一点,因为每行似乎只有一个条目,对于此列,这些单元格中是否有多条线?隐藏在屏幕截图中的
情况下,如果是这种情况,则需要读取单元格内容,在新行上拆分并迭代相同标签名称的列表。
-
-----------------添加代码示例------------------
以下是如何编码此内容的示例。如果&lt; fivelargest&gt;多层次,此代码将对此进行管理。
Should be easy enough, read a tutorial like the GeekforGeeks on doing such a conversion,
https://www.geeksforgeeks.org/how-to-convert-excel-to-xml-format-in-python/
Since currency, units and the <Declaration> and <ReturnDetails> tags are common to all in the xml file and are all the same in the spreadsheet, I presume these can be taken from any row so no need to iterate for those just select a cell to use, like
Currency and units can be added to the 'xml_schema =' text along with the rest of your schema text formatted as you like. <Declaration> and <ReturnDetails> can be achieved using the 'with tag' syntax without the iterations, e.g. this for <ReturnDetails>.
Do same for <Declaration> pointing to the correct cell value.
Then for each of the rest of the rows follow the example in the tutorial to iterate (from columns 6 to 21) using outer tag <ProductReport> and inner tags per the column headers.
The part I'm not sure about is you show in the example xml the first <ProductReport> with 5 <FiveLargest> tags, the second has 3 etc. Do not know how this is determined as each row appears to have only one entry for this column, are there are multiple lines in these cells? hidden in the screen shot
If that is the case you'll need read the cell contents, split on the new line and iterate the list for the same tag name.
-
-----------------Add Code example---------------
Below is an example of how to code this. If the <FiveLargest> are multilined this code will manage that.