具有多个值的组列

发布于 2025-02-02 00:48:31 字数 697 浏览 2 评论 0原文

我有一个看起来像这样的数据框（一个列有多个值，另一列仅带有小数的数字）：

food number
apple,tomato,melon 897.0
apple,meat,banana 984.9
banana,tomato 340.8

我想获得每种食物的平均数量。在示例中为：

Apple =（897.0 + 984.9）/2 = 940.95
香蕉=（984.9 + 340.8）/2 = 662.85，

等等，依此类推，以至于以新的data frame和“平均食物”和“平均水平”结尾处数字。

food average
apple 915.95
banana 662.85

我尝试了Groupby的运气，但结果全都搞砸了：

#reshape data
df = pd.DataFrame({
    'food' : list(chain.from_iterable(df.food.tolist())), 
    'number' : df.number.repeat(df.food.str.len())
})
# groupby
df.groupby('food').number.apply(lambda x: x.unique().tolist())

我必须说原始数据框架的行超过100k行。谢谢。

原文

I have a dataframe that looks like this one (one column has multiple values, the other are just numbers with decimals):

food number
apple,tomato,melon 897.0
apple,meat,banana 984.9
banana,tomato 340.8

I want to get the average number of every food. In the example that'll be:

apple = (897.0 + 984.9)/2 = 940.95
banana = (984.9+340.8)/2 = 662.85

And so on to the point of ending up with a new dataframe with just the foods and the average number.

food average
apple 915.95
banana 662.85

I tried my luck with groupby, but the result is all messed up:

#reshape data
df = pd.DataFrame({
    'food' : list(chain.from_iterable(df.food.tolist())), 
    'number' : df.number.repeat(df.food.str.len())
})
# groupby
df.groupby('food').number.apply(lambda x: x.unique().tolist())

I must say that the original dataframe has over 100k rows. Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

烟花易冷人易散 2025-02-09 00:48:32

使用 dataframe.explame.explame.explode（＆lt; column-name＆gt; ）将列表中的各个项目扩展到单独的单元格中。他们保留原始索引，因此相应的数字被填写。从那里，这是一个简单的组，然后是简单的均值。

import pandas as pd

df = pd.DataFrame({'food': [['apple', 'tomato', 'melon'], 
                            ['apple','meat', 'banana'],
                            ['banana', 'tomato']], 
                   'number': [897, 984.9, 340.8]})

df.explode('food').groupby('food').mean()

结果

        number
food          
apple   940.95
banana  662.85
meat    984.90
melon   897.00
tomato  618.90

Use DataFrame.explode(<column-name>) to expand the individual items in the lists into separate cells. They keep the original index, so the corresponding number gets filled in. From there, it's an easy group by, followed by a simple mean.

import pandas as pd

df = pd.DataFrame({'food': [['apple', 'tomato', 'melon'], 
                            ['apple','meat', 'banana'],
                            ['banana', 'tomato']], 
                   'number': [897, 984.9, 340.8]})

df.explode('food').groupby('food').mean()

results in

        number
food          
apple   940.95
banana  662.85
meat    984.90
melon   897.00
tomato  618.90

回复收藏 0 原文

守望孤独 2025-02-09 00:48:32

首先，您必须将字符串列转换为每个单元格中的列表。我还提供了删除白色空间（如果有）的能力。创建的DF进行修改

import pandas as pd
df = pd.DataFrame({'food': ["apple,tomato, melon", 
                            "apple,meat,banana,melon",
                            "banana, tomato, melon"], 
                   'number': [897, 984.9, 340.8]})

df['food'] = df['food'].str.split(',').apply(lambda x: [e.strip() for e in x]).tolist()
df.explode('food').groupby('food').agg('mean')

9769953

我从@ /f8fyh.png“ alt =”在此处输入图像描述>

如果您想要更多的聚合，则可以使用

df.explode('food').groupby('food').agg(['min', 'mean', 'max'])

First you will have to convert the string column to a list in each cell. I've also included the ability to remove white spaces if any. I modify from the df created by @9769953

import pandas as pd
df = pd.DataFrame({'food': ["apple,tomato, melon", 
                            "apple,meat,banana,melon",
                            "banana, tomato, melon"], 
                   'number': [897, 984.9, 340.8]})

df['food'] = df['food'].str.split(',').apply(lambda x: [e.strip() for e in x]).tolist()
df.explode('food').groupby('food').agg('mean')

Output

If you would like more aggregations, you could use

df.explode('food').groupby('food').agg(['min', 'mean', 'max'])

回复收藏 0 原文

~没有更多了~

关于作者

话少心凉

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

具有多个值的组列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

陪我终i

别忘他

野心澎湃

蒲公英的约定

。

旧时模样

友情链接

具有多个值的组列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

陪我终i

别忘他

野心澎湃

蒲公英的约定

。

旧时模样

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。