在 Pandas 的 Dataframe 中提取数组列内的 JSON 数据
我正在使用 Pandas Library 在 Python 中的数据框中提取 JSON 数组列,其中我有这样的数据
>df
id partnerid payments
5263 org1244 [{"sNo": 1, "amount":"1000"}, {"sNo": 2, "amount":"500"}]
5264 org1245 [{"sNo": 1, "amount":"2000"}, {"sNo": 2, "amount":"600"}]
5265 org1246 [{"sNo": 1, "amount":"3000"}, {"sNo": 2, "amount":"700"}]
我想提取列表中的 JSON 数据并将其添加为同一数据框中的列 像这样,
>mod_df
id partnerid sNo amount
5263 org1244 1 1000
5263 org1244 2 500
5264 org1245 1 2000
5264 org1245 2 600
5265 org1246 1 3000
5265 org1246 2 700
我尝试过这种方法
import pandas as pd
import json as j
df = pd.read_parquet('sample.parquet')
js_loads = df['payments'].apply(j.loads)
js_list = list(js_loads)
j_data = j.dumps(js_list)
df = df.join(pd.read_json(j_data))
df = df.drop(columns=['payments'] , axis=1)
,但是只有当我们在列中而不是 JSON 列表中有 JSON 数据时,这种方法才有效。 有人可以解释一下,我怎样才能达到我想要的输出?
I am working on extracting JSON array column in dataframe in Python using Pandas Library, where I have a data like this
>df
id partnerid payments
5263 org1244 [{"sNo": 1, "amount":"1000"}, {"sNo": 2, "amount":"500"}]
5264 org1245 [{"sNo": 1, "amount":"2000"}, {"sNo": 2, "amount":"600"}]
5265 org1246 [{"sNo": 1, "amount":"3000"}, {"sNo": 2, "amount":"700"}]
I want to extract the JSON data inside the list and add it as a column in same dataframe
like this
>mod_df
id partnerid sNo amount
5263 org1244 1 1000
5263 org1244 2 500
5264 org1245 1 2000
5264 org1245 2 600
5265 org1246 1 3000
5265 org1246 2 700
I have tried with this approach
import pandas as pd
import json as j
df = pd.read_parquet('sample.parquet')
js_loads = df['payments'].apply(j.loads)
js_list = list(js_loads)
j_data = j.dumps(js_list)
df = df.join(pd.read_json(j_data))
df = df.drop(columns=['payments'] , axis=1)
But this works, only if we have JSON data in column not list of JSON.
Can someone explain, how can I achieve my desired output?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通过
ast.literal_eval
将其转换为list
并使用explode()
将每个元素转换为一行,并复制其他列。然后,使用
.apply(pd.Series)
将 dict-like 转换为series
。最后,使用
pd.concat()
连接到原始数据帧。示例:
输出:
Convert it to
list
byast.literal_eval
and useexplode()
to transform each element to a row and also replicate the other columns.Then, use
.apply(pd.Series)
to convert dict-like toseries
.Finally, concatenate to original dataframe using
pd.concat()
.Example:
output: