如何在python中弄平数据框,其中一列包含一个JSON对象?
我有一个数据框,其中列是一个JSON对象之一,如
customer_id | date | json_object
--------------------------------------------------------------------------
A101 | 2022-06-21 | {'name':['james'],'age':[55], 'hobby':['pubg']}
A102 | 2022-06-22 | {'name':['tarzan'],'status':[]}
Jason对象的内容不统一所示。在上面的示例中,第一行中的JSON对象为“爱好”,该对象在第二行的JSON对象中不存在。在第二行中的相似性,属性状态为空IE []
问题:如何在Python中弄平此数据框以创建一个新的数据框,其中每一行仅与一个JSON对应,如下所示
customer_id | date | attribute
---------------------------------------------
A101 | 2022-06-21 | 'name': 'james'
A101 | 2022-06-21 | 'age': 55
A101 | 2022-06-21 | 'hobby': 'pubg'
A102 | 2022-06-22 | 'name': 'tarzan'
A102 | 2022-06-22 | 'status':
I have a dataframe in which is one of the column is a json object as shown below
customer_id | date | json_object
--------------------------------------------------------------------------
A101 | 2022-06-21 | {'name':['james'],'age':[55], 'hobby':['pubg']}
A102 | 2022-06-22 | {'name':['tarzan'],'status':[]}
The content of the jason object is not uniform. In the above example, the json object in the first row as 'hobby' which is not present in the json object of the second row. Similary in the 2nd row, the attribute status is empty i.e. []
Question: How can I flatten this dataframe in Python to create a new dataframe where each row corresponds to one json object only as shown below
customer_id | date | attribute
---------------------------------------------
A101 | 2022-06-21 | 'name': 'james'
A101 | 2022-06-21 | 'age': 55
A101 | 2022-06-21 | 'hobby': 'pubg'
A102 | 2022-06-22 | 'name': 'tarzan'
A102 | 2022-06-22 | 'status':
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
假设
json_object的每个值
是dict
,您也可以使用以下方法:编辑,
因为您将数据框架更改为
我的代码,必须按以下方式调整:
如果空列表随附,然后只需在
lambda
函数中添加if-else
条件即可。请注意,我还将下一个代码提取中的列重命名。Assuming each value of
json_object
is adict
, you could also use the following approach:EDIT
Since you changed your data frame to
my code must be adjusted as follows:
If empty lists are included then simply add an
if-else
condition inside thelambda
function. Note, I have also renamed the columns in the next code extraction.我希望我对你了解:
之后,
df
将是:然后:
打印:
I hope I understood you right:
After this the
df
will be:Then:
Prints:
这不是您要要求的数据结构,而取决于您的后续步骤,但是通常将一个值放入一个单元格中,不要在单列中混合不同的数据是一个好主意。因此,借助数据
和DF,
您也可以做类似的事情
,这些操作将为您提供单独的列中的数据,(几乎)准备进一步处理。
This is not the data structure you asked for and it depends on your subsequent steps, but it is generally a good idea to put one value into one cell and don't mix different data in a single column. So with the data
and the df
you could also do something like
which will give you the data in separate columns, (almost) ready for further processing.