如何将此列表变成数据框架？我想知道如何在Python中做到这一点？

发布于 2025-02-11 18:26:37 字数 1460 浏览 3 评论 0原文

我在Python中有一个清单，在下面显示这样的列表。我想将其变成数据框架。我尝试过： pd.dataframe（mylist），但是'riginins'列存储 list ，但是我想存储<< strong>在同一数据框架中的原点和数量键

myList = [
   {
      "id":3105052,
      "title":"Ebook Relat�rios Gerenciais",
      "offering":"Institucional",
      "created_date":"2022-06-28"
      "inserted_date":"2022-06-28",
      "channel":"Social",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"LinkedIn",
            "quantityLeads":"1"
         },
         {
            "origin":"Facebook",
            "quantityLeads":"1"
         }
      ]
   },
   {
      "id":3105052,
      "title":"Ebook Relat�rios Gerenciais",
      "offering":"Institucional",
      "inserted_date":"2022-06-28",
      "created_date":"2022-06-28",
      "channel":"Direct",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"Desconhecida",
            "quantityLeads":"2"
         }
      ]
   },
   {
      "id":2918513,
      "title":"Ebook Direct To Consumer",
      "offering":"Supply Chain",
      "created_date":"2022-06-28",
      "inserted_date":"2022-06-28",
      "channel":"Social",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"LinkedIn",
            "quantityLeads":"1"
         }
      ]
   }
]

原文

I have a list in python which is showing like this just below. I would like to turn it into a data frame. I tried it: pd.DataFrame(myList), however the 'origins' column stores a list, however I would like to store the origin and quantityLeads keys in that same dataframe

myList = [
   {
      "id":3105052,
      "title":"Ebook Relat�rios Gerenciais",
      "offering":"Institucional",
      "created_date":"2022-06-28"
      "inserted_date":"2022-06-28",
      "channel":"Social",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"LinkedIn",
            "quantityLeads":"1"
         },
         {
            "origin":"Facebook",
            "quantityLeads":"1"
         }
      ]
   },
   {
      "id":3105052,
      "title":"Ebook Relat�rios Gerenciais",
      "offering":"Institucional",
      "inserted_date":"2022-06-28",
      "created_date":"2022-06-28",
      "channel":"Direct",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"Desconhecida",
            "quantityLeads":"2"
         }
      ]
   },
   {
      "id":2918513,
      "title":"Ebook Direct To Consumer",
      "offering":"Supply Chain",
      "created_date":"2022-06-28",
      "inserted_date":"2022-06-28",
      "channel":"Social",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"LinkedIn",
            "quantityLeads":"1"
         }
      ]
   }
]

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

如日中天 2025-02-18 18:26:37

如果您在“起源”中有一个以上的元素，则可以首先爆炸，创建“ Origin”，“ QuantityLeads”，然后决定如何使用其余数据框架。

df = pd.DataFrame(myList)
df = df.explode('origins')
df[['origin', 'quantityLeads']] = pd.DataFrame(df['origins'].tolist())
df.drop('origins', axis=1, inplace=True)

打印（DF）：

        id                        title       offering created_date  \
0  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28   
1  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28   
2  2918513     Ebook Direct To Consumer   Supply Chain   2022-06-28   

  inserted_date channel  start_date    end_date        origin quantityLeads  
0    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1  
1    2022-06-28  Direct  2022-06-28  2022-06-28  Desconhecida             2  
2    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1

If you have more than one element in "origins" you may first explode, create "origin", "quantityLeads" and then decide what to do with rest of the dataframe.

df = pd.DataFrame(myList)
df = df.explode('origins')
df[['origin', 'quantityLeads']] = pd.DataFrame(df['origins'].tolist())
df.drop('origins', axis=1, inplace=True)

print(df):

        id                        title       offering created_date  \
0  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28   
1  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28   
2  2918513     Ebook Direct To Consumer   Supply Chain   2022-06-28   

  inserted_date channel  start_date    end_date        origin quantityLeads  
0    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1  
1    2022-06-28  Direct  2022-06-28  2022-06-28  Desconhecida             2  
2    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1

回复收藏 0 原文

一影成城 2025-02-18 18:26:37

在追求简单性时，您可以将字典结构弄平：

for row in myList:
   row["origin"] = row["origins"][0]["origin"]
   row["quantityLeads"] = row["origins"][0]["quantityLeads"]
   del row["origins"]

df = pd.DataFrame(myList)
print(df)

输出：

        id                        title       offering created_date inserted_date channel  start_date    end_date        origin quantityLeads
0  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1
1  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28    2022-06-28  Direct  2022-06-28  2022-06-28  Desconhecida             2
2  2918513     Ebook Direct To Consumer   Supply Chain    2022-06-28    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1

作为旁注，对于上方的myList示例，在第一个条目的create_date之后，造成错误后有一个丢失的逗号。

编辑：如果Origins列表中有一个可变数量的项目，但是每个项目都具有相同的键，那么我们也可以迭代这些键。

for row in myList:
   origins_list = row["origins"]
   counter = 0
   for item in origins_list:
      row["origin_" + str(counter)] = item["origin"]
      row["quantityLeads_" + str(counter)] = item["quantityLeads"]
      counter += 1

   del row["origins"]

df = pd.DataFrame(myList)
print(df)

In the pursuit of simplicity you could just flatten the dictionary structures with something like:

for row in myList:
   row["origin"] = row["origins"][0]["origin"]
   row["quantityLeads"] = row["origins"][0]["quantityLeads"]
   del row["origins"]

df = pd.DataFrame(myList)
print(df)

Output:

        id                        title       offering created_date inserted_date channel  start_date    end_date        origin quantityLeads
0  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1
1  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28    2022-06-28  Direct  2022-06-28  2022-06-28  Desconhecida             2
2  2918513     Ebook Direct To Consumer   Supply Chain    2022-06-28    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1

Just as a side note, for the myList sample above there is a missing comma after the first entry's created_date that's causing an error.

EDIT: If there are a variable number of items in the origins list, but each item has the same keys then we could iterate over those as well.

for row in myList:
   origins_list = row["origins"]
   counter = 0
   for item in origins_list:
      row["origin_" + str(counter)] = item["origin"]
      row["quantityLeads_" + str(counter)] = item["quantityLeads"]
      counter += 1

   del row["origins"]

df = pd.DataFrame(myList)
print(df)

回复收藏 0 原文

静谧 2025-02-18 18:26:37

它对我有用。

df = pd.DataFrame(myList)
df = df.explode('origins')
df['origin'] = df.origins.str.get('origin')
df['quantityLeads'] = df.origins.str.get('quantityLeads')
df.drop('origins', axis=1, inplace=True)

It's working for me.

df = pd.DataFrame(myList)
df = df.explode('origins')
df['origin'] = df.origins.str.get('origin')
df['quantityLeads'] = df.origins.str.get('quantityLeads')
df.drop('origins', axis=1, inplace=True)

回复收藏 0 原文

~没有更多了~