如何从分类数据框架中创建100%堆叠的条形图

发布于 2025-02-12 18:00:32 字数 1085 浏览 0 评论 0原文

我有一个像这样的数据框架:

用户食物1食物2食物3食物4
史蒂芬​​·洋葱西红柿卷心菜汤汤汤汤汤汤
汤汤番茄薯片土豆弗雷德
胡萝卜茄子茄子茄子
Phil洋葱茄茄子茄子

我想用各种食物柱类别中使用独特的价值。然后,我想创建一个海洋地块,以便将每一列的每个类别的%绘制为100%水平堆叠棒。

我尝试这样做:

data = {
    'User' : ['Steph', 'Tom', 'Fred', 'Phil'],
    'Food 1' : ["Onions", "Potatoes", "Carrots", "Onions"],
    'Food 2' : ['Tomatoes', 'Tomatoes', 'Cabbages', 'Eggplant'],
    'Food 3' : ["Cabbages", "Potatoes", "", "Eggplant"],
    'Food 4' : ['Potatoes', 'Potatoes', 'Eggplant', ''],    
}

df = pd.DataFrame(data)

x_ax = ["Onions", "Potatoes", "Carrots", "Onions", "", 'Eggplant', "Cabbages"]

df.plot(kind="barh", x=x_ax, y=["Food 1", "Food 2", "Food 3", "Food 4"], stacked=True, ax=axes[1])

plt.show()

I have a dataframe structured like this:

UserFood 1Food 2Food 3Food 4
StephOnionsTomatoesCabbagesPotatoes
TomPotatoesTomatoesPotatoesPotatoes
FredCarrotsCabbagesEggplant
PhilOnionsEggplantEggplant

I want to use the distinct values from across the food columns as categories. I then want to create a Seaborn plot so the % of each category for each column is plotted as a 100% horizontal stacked bar.

My attempt to do this:

data = {
    'User' : ['Steph', 'Tom', 'Fred', 'Phil'],
    'Food 1' : ["Onions", "Potatoes", "Carrots", "Onions"],
    'Food 2' : ['Tomatoes', 'Tomatoes', 'Cabbages', 'Eggplant'],
    'Food 3' : ["Cabbages", "Potatoes", "", "Eggplant"],
    'Food 4' : ['Potatoes', 'Potatoes', 'Eggplant', ''],    
}

df = pd.DataFrame(data)

x_ax = ["Onions", "Potatoes", "Carrots", "Onions", "", 'Eggplant', "Cabbages"]

df.plot(kind="barh", x=x_ax, y=["Food 1", "Food 2", "Food 3", "Food 4"], stacked=True, ax=axes[1])

plt.show()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

北渚 2025-02-19 18:00:32
  1. ''''替换为np.nan,因为空刺将被计为值。
  2. 使用pandas.dataframe. -Melt将数据框架转换为长表单。
  3. 使用PANDAS.CROSSTAB归一化参数来计算每个'food'的百分比。
  4. pandas.dataframe.plotand kint ='barh'绘制数据框。
    • 将食物名称放在X轴上并不是创建100%堆叠棒图的正确方法。一个轴必须是数字。酒吧将通过食物类型进行颜色。

  5. 根据此答案
  6. 基于此答案
  • seabornmatplotlibpandas使用matplotlib作为默认后端的高级API,并且更容易用pandas产生一个堆叠的条图。
    • seaborn不支持堆叠的小花,除非histplot以黑客式使用方式使用,如此答案,并且需要额外的步骤熔化百分比
  • python 3.10pandas 1.4.2matplotlib 3.5.1 中测试。
    • 分配表达式(:=)需要python> = 3.8。否则,请使用[f'{v.get_width():. 2f}%'如果V.Get_width()> 0否则''对于c]中的v。
import pandas as pd
import numpy as np

# using the dataframe in the OP

# 1.
df = df.replace('', np.nan)

# 2.
dfm = df.melt(id_vars='User', var_name='Food', value_name='Type')

# 3.
percent = pd.crosstab(dfm.Food, dfm.Type, normalize='index').mul(100).round(2)

# 4.
ax = percent.plot(kind='barh', stacked=True, figsize=(8, 6))

# 5.
for c in ax.containers:
    
    # customize the label to account for cases when there might not be a bar section
    labels = [f'{w:.2f}%' if (w := v.get_width()) > 0 else '' for v in c ]
    
    # set the bar label
    ax.bar_label(c, labels=labels, label_type='center')

# 6.
ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')

dataFrame视图

dfm

     User    Food      Type
0   Steph  Food 1    Onions
1     Tom  Food 1  Potatoes
2    Fred  Food 1   Carrots
3    Phil  Food 1    Onions
4   Steph  Food 2  Tomatoes
5     Tom  Food 2  Tomatoes
6    Fred  Food 2  Cabbages
7    Phil  Food 2  Eggplant
8   Steph  Food 3  Cabbages
9     Tom  Food 3  Potatoes
10   Fred  Food 3       NaN
11   Phil  Food 3  Eggplant
12  Steph  Food 4  Potatoes
13    Tom  Food 4  Potatoes
14   Fred  Food 4  Eggplant
15   Phil  Food 4       NaN

百分比

Type    Cabbages  Carrots  Eggplant  Onions  Potatoes  Tomatoes
Food                                                           
Food 1      0.00     25.0      0.00    50.0     25.00       0.0
Food 2     25.00      0.0     25.00     0.0      0.00      50.0
Food 3     33.33      0.0     33.33     0.0     33.33       0.0
Food 4      0.00      0.0     33.33     0.0     66.67       0.0
  1. Replace '' with np.nan because empty stings will be counted as values.
  2. Use pandas.DataFrame.melt to convert the dataframe to a long form.
  3. Use pandas.crosstab with the normalize parameter to calculate the percent for each 'Food'.
  4. Plot the dataframe with pandas.DataFrame.plot and kind='barh'.
    • Putting the food names on the x-axis is not the correct way to create a 100% stacked bar plot. One axis must be numeric. The bars will be colored by food type.
  5. Annotate the bars based on this answer.
  6. Move the legend outside the plot based on this answer.
  • seaborn is a high-level API for matplotlib, and pandas uses matplotlib as the default backend, and it's easier to produce a stacked bar plot with pandas.
    • seaborn doesn't support stacked barplots, unless histplot is used in a hacked way, as shown in this answer, and would require an extra step of melting percent.
  • Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1
    • Assignment expressions (:=) require python >= 3.8. Otherwise, use [f'{v.get_width():.2f}%' if v.get_width() > 0 else '' for v in c ].
import pandas as pd
import numpy as np

# using the dataframe in the OP

# 1.
df = df.replace('', np.nan)

# 2.
dfm = df.melt(id_vars='User', var_name='Food', value_name='Type')

# 3.
percent = pd.crosstab(dfm.Food, dfm.Type, normalize='index').mul(100).round(2)

# 4.
ax = percent.plot(kind='barh', stacked=True, figsize=(8, 6))

# 5.
for c in ax.containers:
    
    # customize the label to account for cases when there might not be a bar section
    labels = [f'{w:.2f}%' if (w := v.get_width()) > 0 else '' for v in c ]
    
    # set the bar label
    ax.bar_label(c, labels=labels, label_type='center')

# 6.
ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')

enter image description here

DataFrame Views

dfm

     User    Food      Type
0   Steph  Food 1    Onions
1     Tom  Food 1  Potatoes
2    Fred  Food 1   Carrots
3    Phil  Food 1    Onions
4   Steph  Food 2  Tomatoes
5     Tom  Food 2  Tomatoes
6    Fred  Food 2  Cabbages
7    Phil  Food 2  Eggplant
8   Steph  Food 3  Cabbages
9     Tom  Food 3  Potatoes
10   Fred  Food 3       NaN
11   Phil  Food 3  Eggplant
12  Steph  Food 4  Potatoes
13    Tom  Food 4  Potatoes
14   Fred  Food 4  Eggplant
15   Phil  Food 4       NaN

percent

Type    Cabbages  Carrots  Eggplant  Onions  Potatoes  Tomatoes
Food                                                           
Food 1      0.00     25.0      0.00    50.0     25.00       0.0
Food 2     25.00      0.0     25.00     0.0      0.00      50.0
Food 3     33.33      0.0     33.33     0.0     33.33       0.0
Food 4      0.00      0.0     33.33     0.0     66.67       0.0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文