在 Pandas 的 Dataframe 中提取数组列内的 JSON 数据

发布于 2025-01-11 16:24:38 字数 1008 浏览 0 评论 0原文

我正在使用 Pandas Library 在 Python 中的数据框中提取 JSON 数组列,其中我有这样的数据

>df

id      partnerid   payments

5263    org1244     [{"sNo": 1, "amount":"1000"}, {"sNo": 2, "amount":"500"}]
5264    org1245     [{"sNo": 1, "amount":"2000"}, {"sNo": 2, "amount":"600"}]
5265    org1246     [{"sNo": 1, "amount":"3000"}, {"sNo": 2, "amount":"700"}]

我想提取列表中的 JSON 数据并将其添加为同一数据框中的列 像这样,

>mod_df

id      partnerid   sNo amount

5263    org1244     1   1000
5263    org1244     2   500
5264    org1245     1   2000
5264    org1245     2   600
5265    org1246     1   3000
5265    org1246     2   700

我尝试过这种方法

import pandas as pd
import json as j

df = pd.read_parquet('sample.parquet')

js_loads = df['payments'].apply(j.loads)
js_list = list(js_loads)
j_data = j.dumps(js_list)
df = df.join(pd.read_json(j_data))
df = df.drop(columns=['payments'] , axis=1)

,但是只有当我们在列中而不是 JSON 列表中有 JSON 数据时,这种方法才有效。 有人可以解释一下,我怎样才能达到我想要的输出?

I am working on extracting JSON array column in dataframe in Python using Pandas Library, where I have a data like this

>df

id      partnerid   payments

5263    org1244     [{"sNo": 1, "amount":"1000"}, {"sNo": 2, "amount":"500"}]
5264    org1245     [{"sNo": 1, "amount":"2000"}, {"sNo": 2, "amount":"600"}]
5265    org1246     [{"sNo": 1, "amount":"3000"}, {"sNo": 2, "amount":"700"}]

I want to extract the JSON data inside the list and add it as a column in same dataframe
like this

>mod_df

id      partnerid   sNo amount

5263    org1244     1   1000
5263    org1244     2   500
5264    org1245     1   2000
5264    org1245     2   600
5265    org1246     1   3000
5265    org1246     2   700

I have tried with this approach

import pandas as pd
import json as j

df = pd.read_parquet('sample.parquet')

js_loads = df['payments'].apply(j.loads)
js_list = list(js_loads)
j_data = j.dumps(js_list)
df = df.join(pd.read_json(j_data))
df = df.drop(columns=['payments'] , axis=1)

But this works, only if we have JSON data in column not list of JSON.
Can someone explain, how can I achieve my desired output?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

谁与争疯 2025-01-18 16:24:38

通过 ast.literal_eval 将其转换为 list 并使用 explode() 将每个元素转换为一行,并复制其他列。

然后,使用 .apply(pd.Series) 将 dict-like 转换为 series

最后,使用 pd.concat() 连接到原始数据帧。


示例:

import ast

# sample data
d = {'col1': [0, 1, 2], 'payments': ['[{"sNo": 1, "amount":"1000"}, {"sNo": 2, "amount":"500"}]', '[{"sNo": 1, "amount":"2000"}, {"sNo": 2, "amount":"600"}]', '[{"sNo": 1, "amount":"3000"}, {"sNo": 2, "amount":"700"}]']}
df = pd.DataFrame(data=d, index=[0, 1, 2])

df['payments'] = df['payments'].apply(ast.literal_eval)
df = df.explode('payments') 
out = pd.concat([df.drop(['payments'], axis=1), df['payments'].apply(pd.Series)], axis=1).reset_index(drop=True)

输出:

 col1 s无金额
0 0 1 1000
1 0 2 500
2 1 1 2000
3 1 2 600
4 2 1 3000
5 2 2 700

Convert it to list by ast.literal_eval and use explode()to transform each element to a row and also replicate the other columns.

Then, use .apply(pd.Series) to convert dict-like to series.

Finally, concatenate to original dataframe using pd.concat().


Example:

import ast

# sample data
d = {'col1': [0, 1, 2], 'payments': ['[{"sNo": 1, "amount":"1000"}, {"sNo": 2, "amount":"500"}]', '[{"sNo": 1, "amount":"2000"}, {"sNo": 2, "amount":"600"}]', '[{"sNo": 1, "amount":"3000"}, {"sNo": 2, "amount":"700"}]']}
df = pd.DataFrame(data=d, index=[0, 1, 2])

df['payments'] = df['payments'].apply(ast.literal_eval)
df = df.explode('payments') 
out = pd.concat([df.drop(['payments'], axis=1), df['payments'].apply(pd.Series)], axis=1).reset_index(drop=True)

output:

  col1 sNo amount
0 0   1   1000
1 0   2   500
2 1   1   2000
3 1   2   600
4 2   1   3000
5 2   2   700
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文