具有多个值的组列
我有一个看起来像这样的数据框(一个列有多个值,另一列仅带有小数的数字):
food number
apple,tomato,melon 897.0
apple,meat,banana 984.9
banana,tomato 340.8
我想获得每种食物的平均数量。在示例中为:
- Apple =(897.0 + 984.9)/2 = 940.95
- 香蕉=(984.9 + 340.8)/2 = 662.85,
等等,依此类推,以至于以新的data frame和“平均食物”和“平均水平”结尾处数字。
food average
apple 915.95
banana 662.85
我尝试了Groupby的运气,但结果全都搞砸了:
#reshape data
df = pd.DataFrame({
'food' : list(chain.from_iterable(df.food.tolist())),
'number' : df.number.repeat(df.food.str.len())
})
# groupby
df.groupby('food').number.apply(lambda x: x.unique().tolist())
我必须说原始数据框架的行超过100k行。谢谢。
I have a dataframe that looks like this one (one column has multiple values, the other are just numbers with decimals):
food number
apple,tomato,melon 897.0
apple,meat,banana 984.9
banana,tomato 340.8
I want to get the average number of every food. In the example that'll be:
- apple = (897.0 + 984.9)/2 = 940.95
- banana = (984.9+340.8)/2 = 662.85
And so on to the point of ending up with a new dataframe with just the foods and the average number.
food average
apple 915.95
banana 662.85
I tried my luck with groupby, but the result is all messed up:
#reshape data
df = pd.DataFrame({
'food' : list(chain.from_iterable(df.food.tolist())),
'number' : df.number.repeat(df.food.str.len())
})
# groupby
df.groupby('food').number.apply(lambda x: x.unique().tolist())
I must say that the original dataframe has over 100k rows. Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用
dataframe.explame.explame.explode(< column-name> )
将列表中的各个项目扩展到单独的单元格中。他们保留原始索引,因此相应的数字被填写。从那里,这是一个简单的组,然后是简单的均值。
结果
Use
DataFrame.explode(<column-name>)
to expand the individual items in the lists into separate cells. They keep the original index, so the corresponding number gets filled in. From there, it's an easy group by, followed by a simple mean.results in
首先,您必须将字符串列转换为每个单元格中的列表。我还提供了删除白色空间(如果有)的能力。 创建的DF进行修改
9769953
我从@ /f8fyh.png“ alt =”在此处输入图像描述>
如果您想要更多的聚合,则可以使用
First you will have to convert the string column to a list in each cell. I've also included the ability to remove white spaces if any. I modify from the df created by @9769953
Output
If you would like more aggregations, you could use