使用Python将CSV文件转换为单个JSON(w/ arrays)
我一直在尝试使用Python将带有相关密钥的3个CSV文件转换为单个JSON文件。
最初,我尝试使用SAS,但注意到一行中所有数据所需的Proc(我相信)。我无法重新创建一个包含多个客户或仓库的阵列,以进行一次销售。
我面临的挑战是第一CSV是一组独特的数据点,没有重复。第二个CSV链接回到第一个Via,这会创建重复的行,对于第3个CSV而言,这是相同的。
第三CSV是与第一个的一对一关系,第二个CSV与第一个之间的关系为0比1。
3个CSV文件的格式如下:
csv 1-每个唯一ID的单行
saleid | productname |
---|---|
1 | a |
2 | b |
csv2-可以使与CSV1
warehouseid | saleid | warehousename |
---|---|---|
1 | 1 | a |
2 | 2 | b |
1 | 3 | a |
csv3-可以使与CSV1
customerId | saleid | customername |
---|---|---|
1 | 1 | Albert |
2 | 2 | Bob |
3 | 1 | Cath |
的重复关系1 customerId saleid customername 1 cath json的预期格式做这样的事情。
{
"totalSales":2,
"Sales":[{
"saleId":1,
"productName":"A",
"warehouse":[{
"warehouseID":1,
"warehouseName":"A"
}],
"customer":[{
"customerID":1,
"customerName":"Albert"
},
"customerID":3,
"customerName":"Cath"
}],
"Sales":[{
"saleId":2,
"productName":"B",
"warehouse":[{
"warehouseID":2,
"warehouseName":"B"
}],
"customer":[{
"customerID":2,
"customerName":"Bob"
}]
}
到目前为止,我在Python中尝试的东西似乎与我在SAS中所取得的成果相似,因为我认为我缺少将仓库和客户信息作为数组捕获的步骤。
def multicsvtojson():
salesdf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\sales.csv', names=("salesID","ProductName"))
warehousedf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\warehouse.csv', names=("warehouseID", "salesID", "warehouseName"))
customerdf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\customers.csv', names=("customerID", "salesID", "customerName"))
finaldf = pd.merge(pd.merge(salesdf, warehousedf, on='salesID'), customerdf, on='salesID')
finaldf.to_json('finalResult.json', orient='records')
print(finaldf)
结果 [{“ salesid”:“ saleid”,“ productname”:“ productname”,“ warehouseid”:“ warehouseid”,“ warehousename”:“ warehousename”,“ customerId”:“ customerId”:“ customerId”,“ customerId”,“ customername”:“ customerName”:“ productName”} ,,,, {“ salesid”:“ 1”,“ productname”:“ a”,“ a”,“ warehouseid”:“ 1”,“ warehousename”:“ a”,“ customerid”:“ 1”,“ customername”:“ albert”}, {“ salesid”:“ 1”,“ productname”:“ a”,“ warehouseid”:“ 1”,“ warehousename”:“ a”,“ customerId”:“ 3”,“ customername”:“ cath”}, {“ salesid”:“ 2”,“ productName”:“ b”,“ warehouseid”:“ 2”,“ warehousename”:“ b”,“ customerId”:“ 2”,“ customername”:“ bob”}]
I have been trying to convert 3 csv files with related keys into a single JSON file using Python.
Originally, I had tried using SAS but noticed the proc required (I believe) all data to be available in a single row. I was unable to recreate an array containing multiple customers or warehouses against a single sale.
The challenge I am facing is the 1st csv is a unique set of data points there are no duplicates. The 2nd csv links back to the 1st via and this creates duplicate rows, this is the same for the 3rd csv.
The 3rd csv is a 1 to many relationship with the 1st and the 2nd csv has a 0 to 1 to many relationship with the 1st.
The format of the 3 csv files is as follows:
CSV 1 - single row for each unique ID
saleID | ProductName |
---|---|
1 | A |
2 | B |
CSV2 - can have duplicates 1 to many relationship with csv1
WarehouseID | saleID | WarehouseName |
---|---|---|
1 | 1 | A |
2 | 2 | B |
1 | 3 | A |
CSV3 - can have duplicates 1 to many relationship with csv1
customerID | saleID | CustomerName |
---|---|---|
1 | 1 | Albert |
2 | 2 | Bob |
3 | 1 | Cath |
The expected format of the JSON would be something like this.
{
"totalSales":2,
"Sales":[{
"saleId":1,
"productName":"A",
"warehouse":[{
"warehouseID":1,
"warehouseName":"A"
}],
"customer":[{
"customerID":1,
"customerName":"Albert"
},
"customerID":3,
"customerName":"Cath"
}],
"Sales":[{
"saleId":2,
"productName":"B",
"warehouse":[{
"warehouseID":2,
"warehouseName":"B"
}],
"customer":[{
"customerID":2,
"customerName":"Bob"
}]
}
What i've tried so far in python seems to have a similar result as what i achieved in SAS as i think im missing the step to capture the warehouse and customer information as an array.
def multicsvtojson():
salesdf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\sales.csv', names=("salesID","ProductName"))
warehousedf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\warehouse.csv', names=("warehouseID", "salesID", "warehouseName"))
customerdf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\customers.csv', names=("customerID", "salesID", "customerName"))
finaldf = pd.merge(pd.merge(salesdf, warehousedf, on='salesID'), customerdf, on='salesID')
finaldf.to_json('finalResult.json', orient='records')
print(finaldf)
results
[{"salesID":"saleID","ProductName":"productName","warehouseID":"warehouseID","warehouseName":"warehouseName","customerID":"customerID","customerName":"productName"},
{"salesID":"1","ProductName":"A","warehouseID":"1","warehouseName":"A","customerID":"1","customerName":"Albert"},
{"salesID":"1","ProductName":"A","warehouseID":"1","warehouseName":"A","customerID":"3","customerName":"Cath"},
{"salesID":"2","ProductName":"B","warehouseID":"2","warehouseName":"B","customerID":"2","customerName":"Bob"}]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论