如何根据索引条件对列的行求和以创建组列的百分比？

发布于 2025-01-11 14:48:42 字数 1905 浏览 1 评论 0原文

我有以下 Pandas DataFrame：

# Create DataFrame
import pandas as pd
data = {'Menu Category': ['Appetizers', 'Appetizers', 'Appetizers', 'Mains', 'Mains', 
        'Mains', 'Desserts', 'Desserts', 'Desserts'], 
        'Menu Item': ['Fries', 'Poppers', 'Wings', 'Pasta', 'Burger', 'Pizza', 
                      'Ice Cream', 'Cake', 'Fruit'],
        'Sales Quantity': [100, 50, 40, 200, 400, 250, 100, 120, 50],
       }
df = pd.DataFrame(data)
df

我想添加两列。 1) 显示每个项目代表的菜单数量百分比（整个菜单即为该数据集），2) 显示该项目所属菜单类别的数量百分比（例如销售数量<的百分比） /code> Fries 是否代表 Appetizers 组，即 (100/190) * 100)。

我知道如何获取提到的第一列：

# Add % Quantity of Menu Column
percent_menu_qty = []
for i in df['Sales Quantity']:
    i = round(i/df['Sales Quantity'].sum() * 100, 2)
    percent_menu_qty.append(i)
df['% Quantity of Menu'] = percent_menu_qty
df

我不知道该怎么做是第二个。我尝试将 Menu Category 设置为索引并执行以下操作：

# Add % Quantity of Menu Category Column
df = df.set_index('Menu Category')

lst = []
for index, x in df['Sales Quantity'].iteritems():
    if index == 'Appetizers':
        x = x/sum(x)
        lst.append(x)
    elif index == 'Mains':
        x = x/sum(x)
        lst.append(x)
    elif index == 'Desserts':
        x =x/sum(x)
        lst.append(x)
lst

我知道我需要以某种方式为每个菜单类别设置一个条件如果 index == 'a某个菜单类别值'，然后将数量除以该菜单类别的总和。到目前为止我还没能弄清楚。

原文

I have the following Pandas DataFrame:

# Create DataFrame
import pandas as pd
data = {'Menu Category': ['Appetizers', 'Appetizers', 'Appetizers', 'Mains', 'Mains', 
        'Mains', 'Desserts', 'Desserts', 'Desserts'], 
        'Menu Item': ['Fries', 'Poppers', 'Wings', 'Pasta', 'Burger', 'Pizza', 
                      'Ice Cream', 'Cake', 'Fruit'],
        'Sales Quantity': [100, 50, 40, 200, 400, 250, 100, 120, 50],
       }
df = pd.DataFrame(data)
df

I would like to add two columns. 1) that shows the % Quantity of the Menu that each item represents (entire menu being this dataset), and 2) that shows the % Quantity of the Menu Category the item belongs to (like what percentage of the Sale Quantity does Fries represent of the Appetizers group, i.e. (100/190) * 100).

I know how to get the first column mentioned:

# Add % Quantity of Menu Column
percent_menu_qty = []
for i in df['Sales Quantity']:
    i = round(i/df['Sales Quantity'].sum() * 100, 2)
    percent_menu_qty.append(i)
df['% Quantity of Menu'] = percent_menu_qty
df

What I am not sure how to do is the second one. I have tried by setting Menu Category as the index and doing the following:

# Add % Quantity of Menu Category Column
df = df.set_index('Menu Category')

lst = []
for index, x in df['Sales Quantity'].iteritems():
    if index == 'Appetizers':
        x = x/sum(x)
        lst.append(x)
    elif index == 'Mains':
        x = x/sum(x)
        lst.append(x)
    elif index == 'Desserts':
        x =x/sum(x)
        lst.append(x)
lst

I know I need to somehow set a condition for each Menu Category that if index == 'a certain menu category value' then divide quantity by the sum of that menu category. Thus far I haven't been able to figure it out.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

盛夏尉蓝 2025-01-18 14:48:42

首先，我要赞扬你逐行使用全面。我仍然时不时地使用它们，因为我认为循环更容易让其他人在不运行代码本身的情况下阅读和理解其原理。

但是你们。对于这个解决方案，我创建了几个单衬里，让我解释一下每一个衬里的含义。

df['% Quantity of Menu'] = ((df['Sales Quantity']/df['Sales Quantity'].sum())*100).round(2)

对于第一个问题，不是逐行循环，而是将列值除以标量值（即列 df['Sales Quantity'].sum() 的总和），然后该比率乘以100即为百分比，然后四舍五入至小数点后两位。

df['%Qty of Menu Category'] = ((df['Sales Quantity']/df.groupby(['Menu Category'])['Sales Quantity'].transform('sum'))*100).round(2)

因此，对于第二个问题，我们需要将列值除以每个相应类别的总和，而不是整个列。因此，我们通过 groupby 获取每个类别的值 df.groupby(['Menu Category'])['Sales Quantity'].transform('sum')，然后执行与第一个相同的操作一、通过替换部分代码。
这里，为什么我们使用df.groupby(['Menu Category'])['Sales Quantity'].transform('sum')而不是df.groupby(['Menu Category '])['销售数量'].sum()?因为级数的除法可以使用标量或相同维度的级数来完成，前一种方法给出了相同维度的级数。

df['Sales Quantity']
0    100
1     50
2     40
3    200
4    400
5    250
6    100
7    120
8     50
Name: Sales Quantity, dtype: int64

df.groupby(['Menu Category'])['Sales Quantity'].transform('sum')
0    190
1    190
2    190
3    850
4    850
5    850
6    270
7    270
8    270
Name: Sales Quantity, dtype: int64

df.groupby(['Menu Category'])['Sales Quantity'].sum()
Menu Category
Appetizers    190
Desserts      270
Mains         850
Name: Sales Quantity, dtype: int64

First of all, I would like to compliment you on using comprehensive row by row. I still use them for time to time, because I consider loops to be easier for someone else to read and understand what the principle is without running the code itself.

But ye. For this solution, I have created a couple one liners and let me explain what each are.

df['% Quantity of Menu'] = ((df['Sales Quantity']/df['Sales Quantity'].sum())*100).round(2)

For your first problem, instead of looping row to row, this divides the column value with a scalar value (which is the total of the column df['Sales Quantity'].sum()), then the ratio is multiplied with 100 for percentage, then round off at 2 decimal points.

df['%Qty of Menu Category'] = ((df['Sales Quantity']/df.groupby(['Menu Category'])['Sales Quantity'].transform('sum'))*100).round(2)

So, for the second problem, we need to divide the column value with the total of each corresponding category instead of the whole column. So, we get the value with groupby for each category df.groupby(['Menu Category'])['Sales Quantity'].transform('sum'), then did the same as the first one, by replacing the portion of the code.
Here, why do we use df.groupby(['Menu Category'])['Sales Quantity'].transform('sum') instead of df.groupby(['Menu Category'])['Sales Quantity'].sum()? Because for division of a series can be done either with a scalar or with a series of same dimension, and the former way gives us the series of same dimension.

df['Sales Quantity']
0    100
1     50
2     40
3    200
4    400
5    250
6    100
7    120
8     50
Name: Sales Quantity, dtype: int64

df.groupby(['Menu Category'])['Sales Quantity'].transform('sum')
0    190
1    190
2    190
3    850
4    850
5    850
6    270
7    270
8    270
Name: Sales Quantity, dtype: int64

df.groupby(['Menu Category'])['Sales Quantity'].sum()
Menu Category
Appetizers    190
Desserts      270
Mains         850
Name: Sales Quantity, dtype: int64

回复收藏 0 原文

看透却不说透 2025-01-18 14:48:42

我认为您正在寻找 groupby + 变换 sum 来获取“类别”总和；然后将每个“销售数量”除以“类别”总和。这为我们提供了每个菜单项在其菜单类别中的份额。

您还可以使用矢量化 div 方法代替第一列的循环：

df['%Qty of Menu'] = df['Sales Quantity'].div(df['Sales Quantity'].sum()).mul(100).round(2)
df['%Qty of Menu Cat'] = df.groupby('Menu Category')['Sales Quantity'].transform('sum').rdiv(df['Sales Quantity']).mul(100).round(2)

输出：

  Menu Category  Menu Item  Sales Quantity  %Qty of Menu  %Qty of Menu Cat
0    Appetizers      Fries             100          7.63             52.63
1    Appetizers    Poppers              50          3.82             26.32
2    Appetizers      Wings              40          3.05             21.05
3         Mains      Pasta             200         15.27             23.53
4         Mains     Burger             400         30.53             47.06
5         Mains      Pizza             250         19.08             29.41
6      Desserts  Ice Cream             100          7.63             37.04
7      Desserts       Cake             120          9.16             44.44
8      Desserts      Fruit              50          3.82             18.52

I think you're looking for groupby + transform sum to get the "Category" sums; then divide each "Sales Quantity" by their "Category" sum. This gives us the share of each menu item in their menu category.

You can also use the vectorized div method instead of loop for the first column:

df['%Qty of Menu'] = df['Sales Quantity'].div(df['Sales Quantity'].sum()).mul(100).round(2)
df['%Qty of Menu Cat'] = df.groupby('Menu Category')['Sales Quantity'].transform('sum').rdiv(df['Sales Quantity']).mul(100).round(2)

Output:

  Menu Category  Menu Item  Sales Quantity  %Qty of Menu  %Qty of Menu Cat
0    Appetizers      Fries             100          7.63             52.63
1    Appetizers    Poppers              50          3.82             26.32
2    Appetizers      Wings              40          3.05             21.05
3         Mains      Pasta             200         15.27             23.53
4         Mains     Burger             400         30.53             47.06
5         Mains      Pizza             250         19.08             29.41
6      Desserts  Ice Cream             100          7.63             37.04
7      Desserts       Cake             120          9.16             44.44
8      Desserts      Fruit              50          3.82             18.52

回复收藏 0 原文

~没有更多了~