如何在python中弄平数据框，其中一列包含一个JSON对象？

发布于 2025-02-09 12:08:21 字数 793 浏览 1 评论 0原文

我有一个数据框，其中列是一个JSON对象之一，如

customer_id |    date    |             json_object
--------------------------------------------------------------------------
A101        | 2022-06-21 | {'name':['james'],'age':[55], 'hobby':['pubg']}
A102        | 2022-06-22 | {'name':['tarzan'],'status':[]}

Jason对象的内容不统一所示。在上面的示例中，第一行中的JSON对象为“爱好”，该对象在第二行的JSON对象中不存在。在第二行中的相似性，属性状态为空IE []

问题：如何在Python中弄平此数据框以创建一个新的数据框，其中每一行仅与一个JSON对应，如下所示

customer_id |    date    | attribute
---------------------------------------------
A101        | 2022-06-21 | 'name': 'james'
A101        | 2022-06-21 | 'age': 55
A101        | 2022-06-21 | 'hobby': 'pubg'
A102        | 2022-06-22 | 'name': 'tarzan'
A102        | 2022-06-22 | 'status':

原文

I have a dataframe in which is one of the column is a json object as shown below

customer_id |    date    |             json_object
--------------------------------------------------------------------------
A101        | 2022-06-21 | {'name':['james'],'age':[55], 'hobby':['pubg']}
A102        | 2022-06-22 | {'name':['tarzan'],'status':[]}

The content of the jason object is not uniform. In the above example, the json object in the first row as 'hobby' which is not present in the json object of the second row. Similary in the 2nd row, the attribute status is empty i.e. []

Question: How can I flatten this dataframe in Python to create a new dataframe where each row corresponds to one json object only as shown below

customer_id |    date    | attribute
---------------------------------------------
A101        | 2022-06-21 | 'name': 'james'
A101        | 2022-06-21 | 'age': 55
A101        | 2022-06-21 | 'hobby': 'pubg'
A102        | 2022-06-22 | 'name': 'tarzan'
A102        | 2022-06-22 | 'status':

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

箜明 2025-02-16 12:08:21

假设json_object的每个值是dict，您也可以使用以下方法：

df = pd.DataFrame(
    data = {
        "customer_id": ["A101", "A102"],
        "date": ["2022-06-21", "2022-06-22"],
        "json_object": [{'name': 'james','age':55, 'hobby':'pubg'}, {'name': 'tarzan','status':'single'}]
    }
)
df["json_object"] = df["json_object"].map(lambda x: [[i, x[i]] for i in x])
df = df.explode(column="json_object")
df.json_object = df.json_object.str[0].astype(str) + ": " + df.json_object.str[1].astype(str) 
df

------------------------------------------
    customer_id  date        json_object
0   A101         2022-06-21  name: james
0   A101         2022-06-21  age: 55
0   A101         2022-06-21  hobby: pubg
1   A102         2022-06-22  name: tarzan
1   A102         2022-06-22  status: single
------------------------------------------

编辑，

因为您将数据框架更改为

df = pd.DataFrame(
    data = {
        "customer_id": ["A101", "A102"],
        "date": ["2022-06-21", "2022-06-22"],
        "json_object": [{'name': ['james'],'age':[55], 'hobby':['pubg']}, {'name': ['tarzan'],'status':['single']}]
    }
)

我的代码，必须按以下方式调整：

df = pd.DataFrame(
    data = {
        "customer_id": ["A101", "A102"],
        "date": ["2022-06-21", "2022-06-22"],
        "json_object": [{'name': ['james'],'age':[55], 'hobby':['pubg']}, {'name': ['tarzan'],'status':['single']}]
    }
)
df["json_object"] = df["json_object"].map(lambda x: [[i, x[i][0]] for i in x])
df = df.explode(column="json_object")
df.json_object = df.json_object.str[0].astype(str) + ": " + df.json_object.str[1].astype(str) 
df

如果空列表随附，然后只需在lambda函数中添加if-else条件即可。请注意，我还将下一个代码提取中的列重命名。

df = pd.DataFrame(
    data = {
        "customer_id": ["A101", "A102"],
        "date": ["2022-06-21", "2022-06-22"],
        "json_object": [{'name': ['james'],'age':[55], 'hobby':['pubg']}, {'name': ['tarzan'],'status':[]}]
    }
)
df["json_object"] = df["json_object"].map(lambda x: [[i, x[i][0]] if x[i] else [i, ""] for i in x])
df = df.rename(columns={"json_object": "attribute"}).explode(column="attribute")
df.attribute = df.attribute.str[0].astype(str) + ": " + df.attribute.str[1].astype(str)

Assuming each value of json_object is a dict, you could also use the following approach:

df = pd.DataFrame(
    data = {
        "customer_id": ["A101", "A102"],
        "date": ["2022-06-21", "2022-06-22"],
        "json_object": [{'name': 'james','age':55, 'hobby':'pubg'}, {'name': 'tarzan','status':'single'}]
    }
)
df["json_object"] = df["json_object"].map(lambda x: [[i, x[i]] for i in x])
df = df.explode(column="json_object")
df.json_object = df.json_object.str[0].astype(str) + ": " + df.json_object.str[1].astype(str) 
df

------------------------------------------
    customer_id  date        json_object
0   A101         2022-06-21  name: james
0   A101         2022-06-21  age: 55
0   A101         2022-06-21  hobby: pubg
1   A102         2022-06-22  name: tarzan
1   A102         2022-06-22  status: single
------------------------------------------

EDIT

Since you changed your data frame to

df = pd.DataFrame(
    data = {
        "customer_id": ["A101", "A102"],
        "date": ["2022-06-21", "2022-06-22"],
        "json_object": [{'name': ['james'],'age':[55], 'hobby':['pubg']}, {'name': ['tarzan'],'status':['single']}]
    }
)

my code must be adjusted as follows:

df = pd.DataFrame(
    data = {
        "customer_id": ["A101", "A102"],
        "date": ["2022-06-21", "2022-06-22"],
        "json_object": [{'name': ['james'],'age':[55], 'hobby':['pubg']}, {'name': ['tarzan'],'status':['single']}]
    }
)
df["json_object"] = df["json_object"].map(lambda x: [[i, x[i][0]] for i in x])
df = df.explode(column="json_object")
df.json_object = df.json_object.str[0].astype(str) + ": " + df.json_object.str[1].astype(str) 
df

If empty lists are included then simply add an if-else condition inside the lambda function. Note, I have also renamed the columns in the next code extraction.

df = pd.DataFrame(
    data = {
        "customer_id": ["A101", "A102"],
        "date": ["2022-06-21", "2022-06-22"],
        "json_object": [{'name': ['james'],'age':[55], 'hobby':['pubg']}, {'name': ['tarzan'],'status':[]}]
    }
)
df["json_object"] = df["json_object"].map(lambda x: [[i, x[i][0]] if x[i] else [i, ""] for i in x])
df = df.rename(columns={"json_object": "attribute"}).explode(column="attribute")
df.attribute = df.attribute.str[0].astype(str) + ": " + df.attribute.str[1].astype(str)

回复收藏 0 原文

思念绕指尖 2025-02-16 12:08:21

我希望我对你了解：

from ast import literal_eval

df["json_object"] = df["json_object"].apply(lambda x: literal_eval(x).items())
df = df.explode("json_object")

之后，df将是：

  customer_id        date       json_object
0        A101  2022-06-21     (name, james)
0        A101  2022-06-21         (age, 55)
0        A101  2022-06-21     (hobby, pubg)
1        A102  2022-06-22    (name, tarzan)
1        A102  2022-06-22  (status, single)

然后：

df["attribute"] = df["json_object"].apply(lambda x: "{}: {}".format(*x))
df = df.drop(columns="json_object")

print(df)

打印：

  customer_id        date       attribute
0        A101  2022-06-21     name: james
0        A101  2022-06-21         age: 55
0        A101  2022-06-21     hobby: pubg
1        A102  2022-06-22    name: tarzan
1        A102  2022-06-22  status: single

I hope I understood you right:

from ast import literal_eval

df["json_object"] = df["json_object"].apply(lambda x: literal_eval(x).items())
df = df.explode("json_object")

After this the df will be:

  customer_id        date       json_object
0        A101  2022-06-21     (name, james)
0        A101  2022-06-21         (age, 55)
0        A101  2022-06-21     (hobby, pubg)
1        A102  2022-06-22    (name, tarzan)
1        A102  2022-06-22  (status, single)

Then:

df["attribute"] = df["json_object"].apply(lambda x: "{}: {}".format(*x))
df = df.drop(columns="json_object")

print(df)

Prints:

  customer_id        date       attribute
0        A101  2022-06-21     name: james
0        A101  2022-06-21         age: 55
0        A101  2022-06-21     hobby: pubg
1        A102  2022-06-22    name: tarzan
1        A102  2022-06-22  status: single

回复收藏 0 原文

尐偏执 2025-02-16 12:08:21

这不是您要要求的数据结构，而取决于您的后续步骤，但是通常将一个值放入一个单元格中，不要在单列中混合不同的数据是一个好主意。因此，借助数据

data = {"customer_id": "A101 A102".split(), 
        "date": "2022-06-21 2022-06-22".split(), 
        "json_object": [{'name':['james'], 'age':[55], 'hobby':['pubg']}, {'name':['tarzan'],'status':[]}]}

df = pd.DataFrame(pd.DataFrame(data))

和DF，

  customer_id        date                                        json_object
0        A101  2022-06-21  {'name': ['james'], 'age': [55], 'hobby': ['pu...
1        A102  2022-06-22                 {'name': ['tarzan'], 'status': []}

您也可以做类似的事情

pd.concat([df, pd.DataFrame.from_records(df.json_object)], axis=1).drop("json_object", axis=1)

，这些操作将为您提供单独的列中的数据，（几乎）准备进一步处理。

  customer_id        date      name   age   hobby status
0        A101  2022-06-21   [james]  [55]  [pubg]    NaN
1        A102  2022-06-22  [tarzan]   NaN     NaN     []

This is not the data structure you asked for and it depends on your subsequent steps, but it is generally a good idea to put one value into one cell and don't mix different data in a single column. So with the data

data = {"customer_id": "A101 A102".split(), 
        "date": "2022-06-21 2022-06-22".split(), 
        "json_object": [{'name':['james'], 'age':[55], 'hobby':['pubg']}, {'name':['tarzan'],'status':[]}]}

df = pd.DataFrame(pd.DataFrame(data))

and the df

  customer_id        date                                        json_object
0        A101  2022-06-21  {'name': ['james'], 'age': [55], 'hobby': ['pu...
1        A102  2022-06-22                 {'name': ['tarzan'], 'status': []}

you could also do something like

pd.concat([df, pd.DataFrame.from_records(df.json_object)], axis=1).drop("json_object", axis=1)

which will give you the data in separate columns, (almost) ready for further processing.

  customer_id        date      name   age   hobby status
0        A101  2022-06-21   [james]  [55]  [pubg]    NaN
1        A102  2022-06-22  [tarzan]   NaN     NaN     []

回复收藏 0 原文

~没有更多了~