使用Python将CSV文件转换为单个JSON（w/ arrays）

发布于 2025-01-28 01:45:48 字数 2991 浏览 1 评论 0原文

我一直在尝试使用Python将带有相关密钥的3个CSV文件转换为单个JSON文件。

最初，我尝试使用SAS，但注意到一行中所有数据所需的Proc（我相信）。我无法重新创建一个包含多个客户或仓库的阵列，以进行一次销售。

我面临的挑战是第一CSV是一组独特的数据点，没有重复。第二个CSV链接回到第一个Via，这会创建重复的行，对于第3个CSV而言，这是相同的。
第三CSV是与第一个的一对一关系，第二个CSV与第一个之间的关系为0比1。

3个CSV文件的格式如下：

csv 1-每个唯一ID的单行

saleid	productname
1	a
2	b

csv2-可以使与CSV1

warehouseid	saleid	warehousename
1	1	a
2	2	b
1	3	a

csv3-可以使与CSV1

customerId	saleid	customername
1	1	Albert
2	2	Bob
3	1	Cath

的重复关系1 customerId saleid customername 1 cath json的预期格式做这样的事情。

{
"totalSales":2,
"Sales":[{
    "saleId":1,
    "productName":"A",
    "warehouse":[{
        "warehouseID":1,
        "warehouseName":"A"
    }],
    "customer":[{
        "customerID":1,
        "customerName":"Albert"
    },
    "customerID":3,
        "customerName":"Cath"
    }],
"Sales":[{
    "saleId":2,
    "productName":"B",
    "warehouse":[{
        "warehouseID":2,
        "warehouseName":"B"
    }],
    "customer":[{
        "customerID":2,
        "customerName":"Bob"
    }]
 }

到目前为止，我在Python中尝试的东西似乎与我在SAS中所取得的成果相似，因为我认为我缺少将仓库和客户信息作为数组捕获的步骤。

def multicsvtojson():
salesdf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\sales.csv', names=("salesID","ProductName"))
warehousedf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\warehouse.csv', names=("warehouseID", "salesID", "warehouseName"))
customerdf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\customers.csv', names=("customerID", "salesID", "customerName"))

finaldf = pd.merge(pd.merge(salesdf, warehousedf, on='salesID'), customerdf, on='salesID')

finaldf.to_json('finalResult.json', orient='records')

print(finaldf)

结果 [{“ salesid”：“ saleid”，“ productname”：“ productname”，“ warehouseid”：“ warehouseid”，“ warehousename”：“ warehousename”，“ customerId”：“ customerId”：“ customerId”，“ customerId”，“ customername”：“ customerName”：“ productName”} ，，，， {“ salesid”：“ 1”，“ productname”：“ a”，“ a”，“ warehouseid”：“ 1”，“ warehousename”：“ a”，“ customerid”：“ 1”，“ customername”：“ albert”}， {“ salesid”：“ 1”，“ productname”：“ a”，“ warehouseid”：“ 1”，“ warehousename”：“ a”，“ customerId”：“ 3”，“ customername”：“ cath”}， {“ salesid”：“ 2”，“ productName”：“ b”，“ warehouseid”：“ 2”，“ warehousename”：“ b”，“ customerId”：“ 2”，“ customername”：“ bob”}]

原文

I have been trying to convert 3 csv files with related keys into a single JSON file using Python.

Originally, I had tried using SAS but noticed the proc required (I believe) all data to be available in a single row. I was unable to recreate an array containing multiple customers or warehouses against a single sale.

The challenge I am facing is the 1st csv is a unique set of data points there are no duplicates. The 2nd csv links back to the 1st via and this creates duplicate rows, this is the same for the 3rd csv.
The 3rd csv is a 1 to many relationship with the 1st and the 2nd csv has a 0 to 1 to many relationship with the 1st.

The format of the 3 csv files is as follows:

CSV 1 - single row for each unique ID

saleID	ProductName
1	A
2	B

CSV2 - can have duplicates 1 to many relationship with csv1

WarehouseID	saleID	WarehouseName
1	1	A
2	2	B
1	3	A

CSV3 - can have duplicates 1 to many relationship with csv1

customerID	saleID	CustomerName
1	1	Albert
2	2	Bob
3	1	Cath

The expected format of the JSON would be something like this.

{
"totalSales":2,
"Sales":[{
    "saleId":1,
    "productName":"A",
    "warehouse":[{
        "warehouseID":1,
        "warehouseName":"A"
    }],
    "customer":[{
        "customerID":1,
        "customerName":"Albert"
    },
    "customerID":3,
        "customerName":"Cath"
    }],
"Sales":[{
    "saleId":2,
    "productName":"B",
    "warehouse":[{
        "warehouseID":2,
        "warehouseName":"B"
    }],
    "customer":[{
        "customerID":2,
        "customerName":"Bob"
    }]
 }

What i've tried so far in python seems to have a similar result as what i achieved in SAS as i think im missing the step to capture the warehouse and customer information as an array.

def multicsvtojson():
salesdf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\sales.csv', names=("salesID","ProductName"))
warehousedf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\warehouse.csv', names=("warehouseID", "salesID", "warehouseName"))
customerdf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\customers.csv', names=("customerID", "salesID", "customerName"))

finaldf = pd.merge(pd.merge(salesdf, warehousedf, on='salesID'), customerdf, on='salesID')

finaldf.to_json('finalResult.json', orient='records')

print(finaldf)

results
[{"salesID":"saleID","ProductName":"productName","warehouseID":"warehouseID","warehouseName":"warehouseName","customerID":"customerID","customerName":"productName"},
{"salesID":"1","ProductName":"A","warehouseID":"1","warehouseName":"A","customerID":"1","customerName":"Albert"},
{"salesID":"1","ProductName":"A","warehouseID":"1","warehouseName":"A","customerID":"3","customerName":"Cath"},
{"salesID":"2","ProductName":"B","warehouseID":"2","warehouseName":"B","customerID":"2","customerName":"Bob"}]

分享到QQ

分享到微博