如何将此列表变成数据框架?我想知道如何在Python中做到这一点?

发布于 2025-02-11 18:26:37 字数 1460 浏览 3 评论 0原文

我在Python中有一个清单,在下面显示这样的列表。我想将其变成数据框架。我尝试过: pd.dataframe(mylist),但是'riginins'列存储 list ,但是我想存储<< strong>在同一数据框架中的原点和数量键

myList = [
   {
      "id":3105052,
      "title":"Ebook Relat�rios Gerenciais",
      "offering":"Institucional",
      "created_date":"2022-06-28"
      "inserted_date":"2022-06-28",
      "channel":"Social",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"LinkedIn",
            "quantityLeads":"1"
         },
         {
            "origin":"Facebook",
            "quantityLeads":"1"
         }
      ]
   },
   {
      "id":3105052,
      "title":"Ebook Relat�rios Gerenciais",
      "offering":"Institucional",
      "inserted_date":"2022-06-28",
      "created_date":"2022-06-28",
      "channel":"Direct",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"Desconhecida",
            "quantityLeads":"2"
         }
      ]
   },
   {
      "id":2918513,
      "title":"Ebook Direct To Consumer",
      "offering":"Supply Chain",
      "created_date":"2022-06-28",
      "inserted_date":"2022-06-28",
      "channel":"Social",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"LinkedIn",
            "quantityLeads":"1"
         }
      ]
   }
]

I have a list in python which is showing like this just below. I would like to turn it into a data frame. I tried it: pd.DataFrame(myList), however the 'origins' column stores a list, however I would like to store the origin and quantityLeads keys in that same dataframe

myList = [
   {
      "id":3105052,
      "title":"Ebook Relat�rios Gerenciais",
      "offering":"Institucional",
      "created_date":"2022-06-28"
      "inserted_date":"2022-06-28",
      "channel":"Social",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"LinkedIn",
            "quantityLeads":"1"
         },
         {
            "origin":"Facebook",
            "quantityLeads":"1"
         }
      ]
   },
   {
      "id":3105052,
      "title":"Ebook Relat�rios Gerenciais",
      "offering":"Institucional",
      "inserted_date":"2022-06-28",
      "created_date":"2022-06-28",
      "channel":"Direct",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"Desconhecida",
            "quantityLeads":"2"
         }
      ]
   },
   {
      "id":2918513,
      "title":"Ebook Direct To Consumer",
      "offering":"Supply Chain",
      "created_date":"2022-06-28",
      "inserted_date":"2022-06-28",
      "channel":"Social",
      "start_date":"2022-06-28",
      "end_date":"2022-06-28",
      "origins":[
         {
            "origin":"LinkedIn",
            "quantityLeads":"1"
         }
      ]
   }
]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

如日中天 2025-02-18 18:26:37

如果您在“起源”中有一个以上的元素,则可以首先爆炸,创建“ Origin”,“ QuantityLeads”,然后决定如何使用其余数据框架。

df = pd.DataFrame(myList)
df = df.explode('origins')
df[['origin', 'quantityLeads']] = pd.DataFrame(df['origins'].tolist())
df.drop('origins', axis=1, inplace=True)

打印(DF):

        id                        title       offering created_date  \
0  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28   
1  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28   
2  2918513     Ebook Direct To Consumer   Supply Chain   2022-06-28   

  inserted_date channel  start_date    end_date        origin quantityLeads  
0    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1  
1    2022-06-28  Direct  2022-06-28  2022-06-28  Desconhecida             2  
2    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1  

If you have more than one element in "origins" you may first explode, create "origin", "quantityLeads" and then decide what to do with rest of the dataframe.

df = pd.DataFrame(myList)
df = df.explode('origins')
df[['origin', 'quantityLeads']] = pd.DataFrame(df['origins'].tolist())
df.drop('origins', axis=1, inplace=True)

print(df):

        id                        title       offering created_date  \
0  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28   
1  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28   
2  2918513     Ebook Direct To Consumer   Supply Chain   2022-06-28   

  inserted_date channel  start_date    end_date        origin quantityLeads  
0    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1  
1    2022-06-28  Direct  2022-06-28  2022-06-28  Desconhecida             2  
2    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1  
一影成城 2025-02-18 18:26:37

在追求简单性时,您可以将字典结构弄平:

for row in myList:
   row["origin"] = row["origins"][0]["origin"]
   row["quantityLeads"] = row["origins"][0]["quantityLeads"]
   del row["origins"]

df = pd.DataFrame(myList)
print(df)

输出:

        id                        title       offering created_date inserted_date channel  start_date    end_date        origin quantityLeads
0  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1
1  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28    2022-06-28  Direct  2022-06-28  2022-06-28  Desconhecida             2
2  2918513     Ebook Direct To Consumer   Supply Chain    2022-06-28    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1

作为旁注,对于上方的myList示例,在第一个条目的create_date之后,造成错误后有一个丢失的逗号。

编辑:如果Origins列表中有一个可变数量的项目,但是每个项目都具有相同的键,那么我们也可以迭代这些键。

for row in myList:
   origins_list = row["origins"]
   counter = 0
   for item in origins_list:
      row["origin_" + str(counter)] = item["origin"]
      row["quantityLeads_" + str(counter)] = item["quantityLeads"]
      counter += 1

   del row["origins"]

df = pd.DataFrame(myList)
print(df)

In the pursuit of simplicity you could just flatten the dictionary structures with something like:

for row in myList:
   row["origin"] = row["origins"][0]["origin"]
   row["quantityLeads"] = row["origins"][0]["quantityLeads"]
   del row["origins"]

df = pd.DataFrame(myList)
print(df)

Output:

        id                        title       offering created_date inserted_date channel  start_date    end_date        origin quantityLeads
0  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1
1  3105052  Ebook Relat�rios Gerenciais  Institucional   2022-06-28    2022-06-28  Direct  2022-06-28  2022-06-28  Desconhecida             2
2  2918513     Ebook Direct To Consumer   Supply Chain    2022-06-28    2022-06-28  Social  2022-06-28  2022-06-28      LinkedIn             1

Just as a side note, for the myList sample above there is a missing comma after the first entry's created_date that's causing an error.

EDIT: If there are a variable number of items in the origins list, but each item has the same keys then we could iterate over those as well.

for row in myList:
   origins_list = row["origins"]
   counter = 0
   for item in origins_list:
      row["origin_" + str(counter)] = item["origin"]
      row["quantityLeads_" + str(counter)] = item["quantityLeads"]
      counter += 1

   del row["origins"]

df = pd.DataFrame(myList)
print(df)
静谧 2025-02-18 18:26:37

它对我有用。

df = pd.DataFrame(myList)
df = df.explode('origins')
df['origin'] = df.origins.str.get('origin')
df['quantityLeads'] = df.origins.str.get('quantityLeads')
df.drop('origins', axis=1, inplace=True)

It's working for me.

df = pd.DataFrame(myList)
df = df.explode('origins')
df['origin'] = df.origins.str.get('origin')
df['quantityLeads'] = df.origins.str.get('quantityLeads')
df.drop('origins', axis=1, inplace=True)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文