如何将自定义注释添加到堆叠的条图

发布于 2025-01-24 20:35:38 字数 2052 浏览 2 评论 0 原文

由于可读性原因,我试图在直方图中注释海洋中的堆叠直方图,并为直方图中的每个段带有色调。我已经连接了下面的示例数据以及目前正在做的事情:

示例数据: https://easyupload.io/as5uxs < /a>

当前代码以组织和显示图:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# create the dataframe - from sample data file
data = {'brand': ['Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'BMW', 'BMW', 'BMW', 'BMW', 'BMW', 'GM', 'GM', 'GM', 'GM', 'GM', 'GM', 'Toyota', 'Toyota'],
        'Model': ['A3', 'A3', 'A3', 'A5', 'A5', 'RS5', 'RS5', 'RS5', 'RS5', 'M3', 'M3', 'M3', 'X1', 'X1', 'Chevy', 'Chevy', 'Chevy', 'Chevy', 'Caddy', 'Caddy', 'Camry', 'Corolla']}

data = pd.DataFrame(data)

# make the column categorical, using the order of the 'value_counts'
data['brand'] = pd.Categorical(data['brand'], data['brand'].value_counts(sort=True).index)

# We want to sort the hue value (model) alphabetically
hue_order = data['Model'].unique()
hue_order.sort()

f, ax = plt.subplots(figsize=(10, 6))
sns.histplot(data, x="brand", hue="Model", multiple="stack", edgecolor=".3", linewidth=.5, hue_order=hue_order, ax=ax)

这将生成一个带有有序传奇和有序条的好图。但是,当我尝试使用多种方法注释时,我似乎无法使它起作用。我所追求的是具有色调的注释,然后是酒吧的高度(使用该制造商的车辆数量)。因此,例如,对于第一个栏,我希望它在第一个灰色阴影单元格中显示RS5X 4,以演示RS5型号的4辆车,依此类推,依此类推,依此类推。

我已经尝试了很多方法,并且正在努力使它起作用。我已经尝试使用过:

for i, rect in enumerate(ax.patches):
    # Find where everything is located
    height = rect.get_height()
    width = rect.get_width()
    x = rect.get_x()
    y = rect.get_y()

    # The height of the bar is the count value and can used as the label
    label_text = f'{height:.0f}'

    label_x = x + width / 2
    label_y = y + height / 2

    # don't include label if it's equivalently 0
    if height > 0.001:
        ax.text(label_x, label_y, label_text, ha='center', va='center', fontsize=8)

当前结果

I am trying to annotate a stacked histogram in Seaborn with the hue for each segment in the histogram for readability reasons. I've attached sample data below and what I'm currently doing:

Sample data: https://easyupload.io/as5uxs

Current code to organize and display the plot:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# create the dataframe - from sample data file
data = {'brand': ['Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'BMW', 'BMW', 'BMW', 'BMW', 'BMW', 'GM', 'GM', 'GM', 'GM', 'GM', 'GM', 'Toyota', 'Toyota'],
        'Model': ['A3', 'A3', 'A3', 'A5', 'A5', 'RS5', 'RS5', 'RS5', 'RS5', 'M3', 'M3', 'M3', 'X1', 'X1', 'Chevy', 'Chevy', 'Chevy', 'Chevy', 'Caddy', 'Caddy', 'Camry', 'Corolla']}

data = pd.DataFrame(data)

# make the column categorical, using the order of the 'value_counts'
data['brand'] = pd.Categorical(data['brand'], data['brand'].value_counts(sort=True).index)

# We want to sort the hue value (model) alphabetically
hue_order = data['Model'].unique()
hue_order.sort()

f, ax = plt.subplots(figsize=(10, 6))
sns.histplot(data, x="brand", hue="Model", multiple="stack", edgecolor=".3", linewidth=.5, hue_order=hue_order, ax=ax)

This generates a nice plot with an ordered legend and ordered bars. However when I try annotate using a number of methods, I can't seem to get it to work. What I am after is the annotation to have the hue, and then the height of the bar (the number of vehicles with that manufacturer). So for example, for the first bar, I would want it to display RS5x 4 in the first grey shaded cell to demonstrate 4 vehicles of RS5 model, and so on for each segment of the stacked histogram.

I've tried a lot of methods and am struggling to get this to work. I've tried using:

for i, rect in enumerate(ax.patches):
    # Find where everything is located
    height = rect.get_height()
    width = rect.get_width()
    x = rect.get_x()
    y = rect.get_y()

    # The height of the bar is the count value and can used as the label
    label_text = f'{height:.0f}'

    label_x = x + width / 2
    label_y = y + height / 2

    # don't include label if it's equivalently 0
    if height > 0.001:
        ax.text(label_x, label_y, label_text, ha='center', va='center', fontsize=8)

Current Result

enter image description here

But this only displays the height of the bar, which is great, but I am not sure how to get the correct hue text to display along with that height.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

菩提树下叶撕阳。 2025-01-31 20:35:38
  • 为了创建所需的注释,有必要知道创建条形部分的顺序,这很难在现场后面做很多事情。因此,直接绘制重塑数据框将更容易,因为列和行订单更加明确。
    • 这只是一个计数图,而不是直方图,因此,用 pd.crosstab ,它将用'brand'作为索引,'Model'作为列,计数将是值。
    • 创建条形图时,每个数据帧列中的所有值都将连续绘制。由于我们知道列的顺序,因此很容易提取正确的列名称以添加到注释中。绘制所有列'a3',然后'a5'等。
    • 使用代码>)。有9个 barcontiners ,与每一列相对应。
    • 此实现不适用于枚举(ax.Patches),因为有36个补丁。


  • Seaborn是Matplotlib的API,Pandas使用Matplotlib作为默认的绘图后端。
  • matploblib&gt; = 3.4.2具有 用于注释
    • 请参阅此答案有关更多信息和示例。
  • python 3.10 PANDAS 1.4.2 matplotlib 3.5.1

设置和重塑

# create the dataframe
data = {'brand': ['Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'BMW', 'BMW', 'BMW', 'BMW', 'BMW', 'GM', 'GM', 'GM', 'GM', 'GM', 'GM', 'Toyota', 'Toyota'],
        'Model': ['A3', 'A3', 'A3', 'A5', 'A5', 'RS5', 'RS5', 'RS5', 'RS5', 'M3', 'M3', 'M3', 'X1', 'X1', 'Chevy', 'Chevy', 'Chevy', 'Chevy', 'Caddy', 'Caddy', 'Camry', 'Corolla']}

df = pd.DataFrame(data)

# sort brand by value counts
df['brand'] = pd.Categorical(df['brand'], df['brand'].value_counts(sort=True).index)

# reshape the dataframe and get count of each model per brand
ct = pd.crosstab(df.brand, df.Model)

# create a variable for the column names
cols = ct.columns

# display(ct)
Model   A3  A5  Caddy  Camry  Chevy  Corolla  M3  RS5  X1
brand                                                    
Audi     3   2      0      0      0        0   0    4   0
GM       0   0      2      0      4        0   0    0   0
BMW      0   0      0      0      0        0   3    0   2
Toyota   0   0      0      1      0        1   0    0   0

# plot the dataframe, which uses matplotlib as the backend (seaborn is just an api for matplotlib)
ax = ct.plot(kind='bar', stacked=True, width=1, ec='k', figsize=(10, 6), rot=0)

# iterate through each container and add custom annotations
for i, c in enumerate(ax.containers):
    
    # customize the label to account for cases when there might not be a bar section - with assignment expression (h := ...)
    labels = [f'{cols[i]}: {h:0.0f}' if (h := v.get_height()) > 0 else '' for v in c ]
    # without assignment expression v.get_height() must be called twice
    # labels = [f'{cols[i]}: {v.get_height():0.0f}' if v.get_height() > 0 else '' for v in c ]
    
    # set the bar label
    ax.bar_label(c, labels=labels, label_type='center', fontsize=8)
    
plt.show()

  • To create the desired annotation, it's necessary to know the order in which the bar sections are created, which is difficult since seaborn is doing a lot behind the scene. As such, it will be easier to plot a reshaped dataframe directly, because the column and row order is more explicit.
    • This is just a count plot, not a histogram, therefore it's easier to reshape the dataframe with pd.crosstab, which will create a wide dataframe with 'brand' as the index, 'Model' as the columns, and the counts will be the values.
    • When the bar plot is created, all the values in each dataframe column are plotted in succession. Since we know the sequence of the columns, it's easy to extract the correct column name to add to the annotation. All of column 'A3' is plotted, then 'A5', etc.
    • Use enumerate(ax.containers), and then use i to index col (e.g. col[i]). There are 9 BarContiners, which correspond to each column.
    • This implementation won't work with enumerate(ax.patches), because there are 36 patches.
  • seaborn is an api for matplotlib, and pandas uses matplotlib as the default plotting backend.
  • matploblib >= 3.4.2 has .bar_label for annotations
    • See this answer for more information and examples.
  • Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1

Setup and Reshape

# create the dataframe
data = {'brand': ['Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'Audi', 'BMW', 'BMW', 'BMW', 'BMW', 'BMW', 'GM', 'GM', 'GM', 'GM', 'GM', 'GM', 'Toyota', 'Toyota'],
        'Model': ['A3', 'A3', 'A3', 'A5', 'A5', 'RS5', 'RS5', 'RS5', 'RS5', 'M3', 'M3', 'M3', 'X1', 'X1', 'Chevy', 'Chevy', 'Chevy', 'Chevy', 'Caddy', 'Caddy', 'Camry', 'Corolla']}

df = pd.DataFrame(data)

# sort brand by value counts
df['brand'] = pd.Categorical(df['brand'], df['brand'].value_counts(sort=True).index)

# reshape the dataframe and get count of each model per brand
ct = pd.crosstab(df.brand, df.Model)

# create a variable for the column names
cols = ct.columns

# display(ct)
Model   A3  A5  Caddy  Camry  Chevy  Corolla  M3  RS5  X1
brand                                                    
Audi     3   2      0      0      0        0   0    4   0
GM       0   0      2      0      4        0   0    0   0
BMW      0   0      0      0      0        0   3    0   2
Toyota   0   0      0      1      0        1   0    0   0

Plot and Annotate

# plot the dataframe, which uses matplotlib as the backend (seaborn is just an api for matplotlib)
ax = ct.plot(kind='bar', stacked=True, width=1, ec='k', figsize=(10, 6), rot=0)

# iterate through each container and add custom annotations
for i, c in enumerate(ax.containers):
    
    # customize the label to account for cases when there might not be a bar section - with assignment expression (h := ...)
    labels = [f'{cols[i]}: {h:0.0f}' if (h := v.get_height()) > 0 else '' for v in c ]
    # without assignment expression v.get_height() must be called twice
    # labels = [f'{cols[i]}: {v.get_height():0.0f}' if v.get_height() > 0 else '' for v in c ]
    
    # set the bar label
    ax.bar_label(c, labels=labels, label_type='center', fontsize=8)
    
plt.show()

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文