使用不同的模式分析多个(n)XML文件,然后返回多个(n)dataframes
我有三个具有特定标签和不同架构的XML文件,如下所示。
file:1
<LETADA-LOOK_TYP>
<COLUMNS> Long Short Value </COLUMNS>
<DATA> XC XC 10670003039 </DATA>
<DATA> GH GH 10450003040 </DATA>
<DATA> HJ HJ 10220002989 </DATA>
<DATA> FF FF 10990002988 </DATA>
<DATA> DD DD 10660003041 </DATA>
<DATA> FE FE 10660002991 </DATA>
<DATA> SS SS 10090003042 </DATA>
<DATA> LL LL 10100002990 </DATA>
</LETADA-LOOK_TYP>
file:2
<LETADA-LOOK_TYP>
<COLUMNS> Long Name Value </COLUMNS>
<DATA> LD ER 10670045039 </DATA>
<DATA> FR RT 10450065040 </DATA>
<DATA> YT VG 10220090989 </DATA>
<DATA> QW TY 10990023988 </DATA>
<DATA> WE ER 10660034041 </DATA>
<DATA> ER FG 10660045991 </DATA>
<DATA> ER ER 10090067042 </DATA>
<DATA> PO PO 10100044990 </DATA>
</LETADA-LOOK_TYP>
file:3
<LETADA-LOOK_TYP>
<COLUMNS> Punt GrubName Value </COLUMNS>
<DATA> GF ER 10689045039 </DATA>
<DATA> TY RT 10434065040 </DATA>
<DATA> JJ VG 10212090989 </DATA>
<DATA> QW TY 10989023988 </DATA>
<DATA> TY ER 10676034041 </DATA>
<DATA> II FG 10609045991 </DATA>
<DATA> OI ER 10023067042 </DATA>
<DATA> OW PO 10145044990 </DATA>
</LETADA-LOOK_TYP>
因此,我已经编写了一个Python脚本来解析这些文件,但是由于这些文件具有不同的模式,因此我必须手动为每个文件编写一个脚本,并从中获取数据框架,然后将所有所达到的数据范围置为以获取所需的输出。
parsing files:
import pandas as pd
import numpy as np
import xml.etree.ElementTree as et
data1 =[]
tree = et.parse(file1)
root = tree.getroot()
for mt in root.findall("LETADA-LOOK_TYP"):
column = mt.find('COLUMNS').text
column1 = column.split('\t')
for ms in root.findall("LETADA-LOOK_TYP"):
datatab = ms.findall("DATA")
for dat in datatab:
data = dat.text.split('\t')
data1.append(data)
dataframe1 = pd.DataFrame(data1, columns = column1)
Parsing file 2 in a different cell:
data2 =[]
tree = et.parse(file2)
root = tree.getroot()
for mt in root.findall("LETADA-LOOK_TYP"):
column = mt.find('COLUMNS').text
column2 = column.split('\t')
for ms in root.findall("LETADA-LOOK_TYP"):
datatab = ms.findall("DATA")
for dat in datatab:
data = dat.text.split('\t')
data2.append(data)
dataframe2 = pd.DataFrame(data2, columns = column2)
Parsing file 3 in a different cell:
data3 =[]
tree = et.parse(file3)
root = tree.getroot()
for mt in root.findall("LETADA-LOOK_TYP"):
column = mt.find('COLUMNS').text
column3 = column.split('\t')
for ms in root.findall("LETADA-LOOK_TYP"):
datatab = ms.findall("DATA")
for dat in datatab:
data = dat.text.split('\t')
data3.append(data)
dataframe3 = pd.DataFrame(data3, columns = column3)
list_df =[dataframe1,dataframe2,dataframe3]
final_df = pd.concat(list_df).reset_index(drop = True)
使用上述多行代码,我可以获得所需的输出
I have three xml files with a certain tag and different schema as shown below.
file:1
<LETADA-LOOK_TYP>
<COLUMNS> Long Short Value </COLUMNS>
<DATA> XC XC 10670003039 </DATA>
<DATA> GH GH 10450003040 </DATA>
<DATA> HJ HJ 10220002989 </DATA>
<DATA> FF FF 10990002988 </DATA>
<DATA> DD DD 10660003041 </DATA>
<DATA> FE FE 10660002991 </DATA>
<DATA> SS SS 10090003042 </DATA>
<DATA> LL LL 10100002990 </DATA>
</LETADA-LOOK_TYP>
file:2
<LETADA-LOOK_TYP>
<COLUMNS> Long Name Value </COLUMNS>
<DATA> LD ER 10670045039 </DATA>
<DATA> FR RT 10450065040 </DATA>
<DATA> YT VG 10220090989 </DATA>
<DATA> QW TY 10990023988 </DATA>
<DATA> WE ER 10660034041 </DATA>
<DATA> ER FG 10660045991 </DATA>
<DATA> ER ER 10090067042 </DATA>
<DATA> PO PO 10100044990 </DATA>
</LETADA-LOOK_TYP>
file:3
<LETADA-LOOK_TYP>
<COLUMNS> Punt GrubName Value </COLUMNS>
<DATA> GF ER 10689045039 </DATA>
<DATA> TY RT 10434065040 </DATA>
<DATA> JJ VG 10212090989 </DATA>
<DATA> QW TY 10989023988 </DATA>
<DATA> TY ER 10676034041 </DATA>
<DATA> II FG 10609045991 </DATA>
<DATA> OI ER 10023067042 </DATA>
<DATA> OW PO 10145044990 </DATA>
</LETADA-LOOK_TYP>
so I have written a python script to parse these files but because these files have different schema I had to manual write a script for each file and get dataframe out of it and then concat all the dataframes achieved to get the desired output.
parsing files:
import pandas as pd
import numpy as np
import xml.etree.ElementTree as et
data1 =[]
tree = et.parse(file1)
root = tree.getroot()
for mt in root.findall("LETADA-LOOK_TYP"):
column = mt.find('COLUMNS').text
column1 = column.split('\t')
for ms in root.findall("LETADA-LOOK_TYP"):
datatab = ms.findall("DATA")
for dat in datatab:
data = dat.text.split('\t')
data1.append(data)
dataframe1 = pd.DataFrame(data1, columns = column1)
Parsing file 2 in a different cell:
data2 =[]
tree = et.parse(file2)
root = tree.getroot()
for mt in root.findall("LETADA-LOOK_TYP"):
column = mt.find('COLUMNS').text
column2 = column.split('\t')
for ms in root.findall("LETADA-LOOK_TYP"):
datatab = ms.findall("DATA")
for dat in datatab:
data = dat.text.split('\t')
data2.append(data)
dataframe2 = pd.DataFrame(data2, columns = column2)
Parsing file 3 in a different cell:
data3 =[]
tree = et.parse(file3)
root = tree.getroot()
for mt in root.findall("LETADA-LOOK_TYP"):
column = mt.find('COLUMNS').text
column3 = column.split('\t')
for ms in root.findall("LETADA-LOOK_TYP"):
datatab = ms.findall("DATA")
for dat in datatab:
data = dat.text.split('\t')
data3.append(data)
dataframe3 = pd.DataFrame(data3, columns = column3)
list_df =[dataframe1,dataframe2,dataframe3]
final_df = pd.concat(list_df).reset_index(drop = True)
using the above multiple lines of code I can get the desired output but is there a way to parse multiple files with different schema and return multiple dataframes and then concat them to get a final output
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论