使用Python将CSV文件转换为单个JSON(w/ arrays)

发布于 2025-01-28 01:45:48 字数 2991 浏览 1 评论 0原文

我一直在尝试使用Python将带有相关密钥的3个CSV文件转换为单个JSON文件。

最初,我尝试使用SAS,但注意到一行中所有数据所需的Proc(我相信)。我无法重新创建一个包含多个客户或仓库的阵列,以进行一次销售。

我面临的挑战是第一CSV是一组独特的数据点,没有重复。第二个CSV链接回到第一个Via,这会创建重复的行,对于第3个CSV而言,这是相同的。
第三CSV是与第一个的一对一关系,第二个CSV与第一个之间的关系为0比1。

3个CSV文件的格式如下:

csv 1-每个唯一ID的单行

saleidproductname
1a
2b

csv2-可以使与CSV1

warehouseidsaleidwarehousename
11a
22b
13a

csv3-可以使与CSV1

customerIdsaleidcustomername
11Albert
22Bob
31Cath

的重复关系1 customerId saleid customername 1 cath json的预期格式做这样的事情。

{
"totalSales":2,
"Sales":[{
    "saleId":1,
    "productName":"A",
    "warehouse":[{
        "warehouseID":1,
        "warehouseName":"A"
    }],
    "customer":[{
        "customerID":1,
        "customerName":"Albert"
    },
    "customerID":3,
        "customerName":"Cath"
    }],
"Sales":[{
    "saleId":2,
    "productName":"B",
    "warehouse":[{
        "warehouseID":2,
        "warehouseName":"B"
    }],
    "customer":[{
        "customerID":2,
        "customerName":"Bob"
    }]
 }

到目前为止,我在Python中尝试的东西似乎与我在SAS中所取得的成果相似,因为我认为我缺少将仓库和客户信息作为数组捕获的步骤。

def multicsvtojson():
salesdf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\sales.csv', names=("salesID","ProductName"))
warehousedf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\warehouse.csv', names=("warehouseID", "salesID", "warehouseName"))
customerdf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\customers.csv', names=("customerID", "salesID", "customerName"))

finaldf = pd.merge(pd.merge(salesdf, warehousedf, on='salesID'), customerdf, on='salesID')

finaldf.to_json('finalResult.json', orient='records')

print(finaldf)

结果 [{“ salesid”:“ saleid”,“ productname”:“ productname”,“ warehouseid”:“ warehouseid”,“ warehousename”:“ warehousename”,“ customerId”:“ customerId”:“ customerId”,“ customerId”,“ customername”:“ customerName”:“ productName”} ,,,, {“ salesid”:“ 1”,“ productname”:“ a”,“ a”,“ warehouseid”:“ 1”,“ warehousename”:“ a”,“ customerid”:“ 1”,“ customername”:“ albert”}, {“ salesid”:“ 1”,“ productname”:“ a”,“ warehouseid”:“ 1”,“ warehousename”:“ a”,“ customerId”:“ 3”,“ customername”:“ cath”}, {“ salesid”:“ 2”,“ productName”:“ b”,“ warehouseid”:“ 2”,“ warehousename”:“ b”,“ customerId”:“ 2”,“ customername”:“ bob”}]

I have been trying to convert 3 csv files with related keys into a single JSON file using Python.

Originally, I had tried using SAS but noticed the proc required (I believe) all data to be available in a single row. I was unable to recreate an array containing multiple customers or warehouses against a single sale.

The challenge I am facing is the 1st csv is a unique set of data points there are no duplicates. The 2nd csv links back to the 1st via and this creates duplicate rows, this is the same for the 3rd csv.
The 3rd csv is a 1 to many relationship with the 1st and the 2nd csv has a 0 to 1 to many relationship with the 1st.

The format of the 3 csv files is as follows:

CSV 1 - single row for each unique ID

saleIDProductName
1A
2B

CSV2 - can have duplicates 1 to many relationship with csv1

WarehouseIDsaleIDWarehouseName
11A
22B
13A

CSV3 - can have duplicates 1 to many relationship with csv1

customerIDsaleIDCustomerName
11Albert
22Bob
31Cath

The expected format of the JSON would be something like this.

{
"totalSales":2,
"Sales":[{
    "saleId":1,
    "productName":"A",
    "warehouse":[{
        "warehouseID":1,
        "warehouseName":"A"
    }],
    "customer":[{
        "customerID":1,
        "customerName":"Albert"
    },
    "customerID":3,
        "customerName":"Cath"
    }],
"Sales":[{
    "saleId":2,
    "productName":"B",
    "warehouse":[{
        "warehouseID":2,
        "warehouseName":"B"
    }],
    "customer":[{
        "customerID":2,
        "customerName":"Bob"
    }]
 }

What i've tried so far in python seems to have a similar result as what i achieved in SAS as i think im missing the step to capture the warehouse and customer information as an array.

def multicsvtojson():
salesdf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\sales.csv', names=("salesID","ProductName"))
warehousedf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\warehouse.csv', names=("warehouseID", "salesID", "warehouseName"))
customerdf = pandas.read_csv('C:\\Python\\multiCSVtoJSON\\customers.csv', names=("customerID", "salesID", "customerName"))

finaldf = pd.merge(pd.merge(salesdf, warehousedf, on='salesID'), customerdf, on='salesID')

finaldf.to_json('finalResult.json', orient='records')

print(finaldf)

results
[{"salesID":"saleID","ProductName":"productName","warehouseID":"warehouseID","warehouseName":"warehouseName","customerID":"customerID","customerName":"productName"},
{"salesID":"1","ProductName":"A","warehouseID":"1","warehouseName":"A","customerID":"1","customerName":"Albert"},
{"salesID":"1","ProductName":"A","warehouseID":"1","warehouseName":"A","customerID":"3","customerName":"Cath"},
{"salesID":"2","ProductName":"B","warehouseID":"2","warehouseName":"B","customerID":"2","customerName":"Bob"}]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文