<COLUMNS> Long Short Value </COLUMNS>
<DATA> XC XC 10670003039 </DATA>
<DATA> GH GH 10450003040 </DATA>
<DATA> HJ HJ 10220002989 </DATA>
<DATA> FF FF 10990002988 </DATA>
<DATA> DD DD 10660003041 </DATA>
<DATA> FE FE 10660002991 </DATA>
<DATA> SS SS 10090003042 </DATA>
<DATA> LL LL 10100002990 </DATA>
<COLUMNS> Long Name Value </COLUMNS>
<DATA> LD ER 10670045039 </DATA>
<DATA> FR RT 10450065040 </DATA>
<DATA> YT VG 10220090989 </DATA>
<DATA> QW TY 10990023988 </DATA>
<DATA> WE ER 10660034041 </DATA>
<DATA> ER FG 10660045991 </DATA>
<DATA> ER ER 10090067042 </DATA>
<DATA> PO PO 10100044990 </DATA>
<COLUMNS> Punt GrubName Value </COLUMNS>
<DATA> GF ER 10689045039 </DATA>
<DATA> TY RT 10434065040 </DATA>
<DATA> JJ VG 10212090989 </DATA>
<DATA> QW TY 10989023988 </DATA>
<DATA> TY ER 10676034041 </DATA>
<DATA> II FG 10609045991 </DATA>
<DATA> OI ER 10023067042 </DATA>
<DATA> OW PO 10145044990 </DATA>
parsing files:
import pandas as pd
import numpy as np
import xml.etree.ElementTree as et
data1 =[]
tree = et.parse(file1)
root = tree.getroot()
for mt in root.findall("LETADA-LOOK_TYP"):
column = mt.find('COLUMNS').text
column1 = column.split('\t')
for ms in root.findall("LETADA-LOOK_TYP"):
datatab = ms.findall("DATA")
for dat in datatab:
data = dat.text.split('\t')
dataframe1 = pd.DataFrame(data1, columns = column1)
Parsing file 2 in a different cell:
data2 =[]
tree = et.parse(file2)
root = tree.getroot()
for mt in root.findall("LETADA-LOOK_TYP"):
column = mt.find('COLUMNS').text
column2 = column.split('\t')
for ms in root.findall("LETADA-LOOK_TYP"):
datatab = ms.findall("DATA")
for dat in datatab:
data = dat.text.split('\t')
dataframe2 = pd.DataFrame(data2, columns = column2)
Parsing file 3 in a different cell:
data3 =[]
tree = et.parse(file3)
root = tree.getroot()
for mt in root.findall("LETADA-LOOK_TYP"):
column = mt.find('COLUMNS').text
column3 = column.split('\t')
for ms in root.findall("LETADA-LOOK_TYP"):
datatab = ms.findall("DATA")
for dat in datatab:
data = dat.text.split('\t')
dataframe3 = pd.DataFrame(data3, columns = column3)
list_df =[dataframe1,dataframe2,dataframe3]
final_df = pd.concat(list_df).reset_index(drop = True)
I have three xml files with a certain tag and different schema as shown below.
<COLUMNS> Long Short Value </COLUMNS>
<DATA> XC XC 10670003039 </DATA>
<DATA> GH GH 10450003040 </DATA>
<DATA> HJ HJ 10220002989 </DATA>
<DATA> FF FF 10990002988 </DATA>
<DATA> DD DD 10660003041 </DATA>
<DATA> FE FE 10660002991 </DATA>
<DATA> SS SS 10090003042 </DATA>
<DATA> LL LL 10100002990 </DATA>
<COLUMNS> Long Name Value </COLUMNS>
<DATA> LD ER 10670045039 </DATA>
<DATA> FR RT 10450065040 </DATA>
<DATA> YT VG 10220090989 </DATA>
<DATA> QW TY 10990023988 </DATA>
<DATA> WE ER 10660034041 </DATA>
<DATA> ER FG 10660045991 </DATA>
<DATA> ER ER 10090067042 </DATA>
<DATA> PO PO 10100044990 </DATA>
<COLUMNS> Punt GrubName Value </COLUMNS>
<DATA> GF ER 10689045039 </DATA>
<DATA> TY RT 10434065040 </DATA>
<DATA> JJ VG 10212090989 </DATA>
<DATA> QW TY 10989023988 </DATA>
<DATA> TY ER 10676034041 </DATA>
<DATA> II FG 10609045991 </DATA>
<DATA> OI ER 10023067042 </DATA>
<DATA> OW PO 10145044990 </DATA>
so I have written a python script to parse these files but because these files have different schema I had to manual write a script for each file and get dataframe out of it and then concat all the dataframes achieved to get the desired output.
parsing files:
import pandas as pd
import numpy as np
import xml.etree.ElementTree as et
data1 =[]
tree = et.parse(file1)
root = tree.getroot()
for mt in root.findall("LETADA-LOOK_TYP"):
column = mt.find('COLUMNS').text
column1 = column.split('\t')
for ms in root.findall("LETADA-LOOK_TYP"):
datatab = ms.findall("DATA")
for dat in datatab:
data = dat.text.split('\t')
dataframe1 = pd.DataFrame(data1, columns = column1)
Parsing file 2 in a different cell:
data2 =[]
tree = et.parse(file2)
root = tree.getroot()
for mt in root.findall("LETADA-LOOK_TYP"):
column = mt.find('COLUMNS').text
column2 = column.split('\t')
for ms in root.findall("LETADA-LOOK_TYP"):
datatab = ms.findall("DATA")
for dat in datatab:
data = dat.text.split('\t')
dataframe2 = pd.DataFrame(data2, columns = column2)
Parsing file 3 in a different cell:
data3 =[]
tree = et.parse(file3)
root = tree.getroot()
for mt in root.findall("LETADA-LOOK_TYP"):
column = mt.find('COLUMNS').text
column3 = column.split('\t')
for ms in root.findall("LETADA-LOOK_TYP"):
datatab = ms.findall("DATA")
for dat in datatab:
data = dat.text.split('\t')
dataframe3 = pd.DataFrame(data3, columns = column3)
list_df =[dataframe1,dataframe2,dataframe3]
final_df = pd.concat(list_df).reset_index(drop = True)
using the above multiple lines of code I can get the desired output but is there a way to parse multiple files with different schema and return multiple dataframes and then concat them to get a final output
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
