具有多个值的组列

发布于 2025-02-02 00:48:31 字数 697 浏览 2 评论 0原文

我有一个看起来像这样的数据框(一个列有多个值,另一列仅带有小数的数字):

food number
apple,tomato,melon 897.0
apple,meat,banana 984.9
banana,tomato 340.8

我想获得每种食物的平均数量。在示例中为:

  • Apple =(897.0 + 984.9)/2 = 940.95
  • 香蕉=(984.9 + 340.8)/2 = 662.85,

等等,依此类推,以至于以新的data frame和“平均食物”和“平均水平”结尾处数字。

food average
apple 915.95
banana 662.85

我尝试了Groupby的运气,但结果全都搞砸了:

#reshape data
df = pd.DataFrame({
    'food' : list(chain.from_iterable(df.food.tolist())), 
    'number' : df.number.repeat(df.food.str.len())
})
# groupby
df.groupby('food').number.apply(lambda x: x.unique().tolist())

我必须说原始数据框架的行超过100k行。谢谢。

I have a dataframe that looks like this one (one column has multiple values, the other are just numbers with decimals):

food number
apple,tomato,melon 897.0
apple,meat,banana 984.9
banana,tomato 340.8

I want to get the average number of every food. In the example that'll be:

  • apple = (897.0 + 984.9)/2 = 940.95
  • banana = (984.9+340.8)/2 = 662.85

And so on to the point of ending up with a new dataframe with just the foods and the average number.

food average
apple 915.95
banana 662.85

I tried my luck with groupby, but the result is all messed up:

#reshape data
df = pd.DataFrame({
    'food' : list(chain.from_iterable(df.food.tolist())), 
    'number' : df.number.repeat(df.food.str.len())
})
# groupby
df.groupby('food').number.apply(lambda x: x.unique().tolist())

I must say that the original dataframe has over 100k rows. Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

烟花易冷人易散 2025-02-09 00:48:32

使用 dataframe.explame.explame.explode(< column-name> )将列表中的各个项目扩展到单独的单元格中。他们保留原始索引,因此相应的数字被填写。从那里,这是一个简单的组,然后是简单的均值。

import pandas as pd

df = pd.DataFrame({'food': [['apple', 'tomato', 'melon'], 
                            ['apple','meat', 'banana'],
                            ['banana', 'tomato']], 
                   'number': [897, 984.9, 340.8]})

df.explode('food').groupby('food').mean()

结果

        number
food          
apple   940.95
banana  662.85
meat    984.90
melon   897.00
tomato  618.90

Use DataFrame.explode(<column-name>) to expand the individual items in the lists into separate cells. They keep the original index, so the corresponding number gets filled in. From there, it's an easy group by, followed by a simple mean.

import pandas as pd

df = pd.DataFrame({'food': [['apple', 'tomato', 'melon'], 
                            ['apple','meat', 'banana'],
                            ['banana', 'tomato']], 
                   'number': [897, 984.9, 340.8]})

df.explode('food').groupby('food').mean()

results in

        number
food          
apple   940.95
banana  662.85
meat    984.90
melon   897.00
tomato  618.90
守望孤独 2025-02-09 00:48:32

首先,您必须将字符串列转换为每个单元格中的列表。我还提供了删除白色空间(如果有)的能力。 创建的DF进行修改

import pandas as pd
df = pd.DataFrame({'food': ["apple,tomato, melon", 
                            "apple,meat,banana,melon",
                            "banana, tomato, melon"], 
                   'number': [897, 984.9, 340.8]})

df['food'] = df['food'].str.split(',').apply(lambda x: [e.strip() for e in x]).tolist()
df.explode('food').groupby('food').agg('mean')

9769953

我从@ /f8fyh.png“ alt =”在此处输入图像描述>

如果您想要更多的聚合,则可以使用

df.explode('food').groupby('food').agg(['min', 'mean', 'max'])

“在此处输入图像说明”

First you will have to convert the string column to a list in each cell. I've also included the ability to remove white spaces if any. I modify from the df created by @9769953

import pandas as pd
df = pd.DataFrame({'food': ["apple,tomato, melon", 
                            "apple,meat,banana,melon",
                            "banana, tomato, melon"], 
                   'number': [897, 984.9, 340.8]})

df['food'] = df['food'].str.split(',').apply(lambda x: [e.strip() for e in x]).tolist()
df.explode('food').groupby('food').agg('mean')

Output

enter image description here

If you would like more aggregations, you could use

df.explode('food').groupby('food').agg(['min', 'mean', 'max'])

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文