用value_counts()绘制pandas dataframe中多列的bar-charts()

发布于 2025-02-06 02:27:01 字数 2020 浏览 1 评论 0原文

我正在尝试绘制带有熊猫数据框中所有列的唯一值的栏目。 df.hist()对数值列的类型,但我有分类列。

  • 我更喜欢使用面向对象的方法,因为如果对我来说更自然和明确。
  • 我想以网格方式(再次像df.hist() die)中有多个轴(子图)。

下面我的解决方案确实做了我想要的,但感觉很麻烦。我怀疑我是否真的需要直接依赖Matplotlib(以及创建图形,删除未使用轴等的所有代码)。我看到 pandas.series.plot 子图布局似乎指向我想要的东西,但也许我完全不在这里。我尝试在数据框中循环循环并应用这些参数,但我无法弄清楚。

有人知道一种更紧凑的方法来完成我要实现的目标吗?

# Defining the grid-dimensions of the Axes in the Matplotlib Figure
nr_of_plots = len(ames_train_categorical.columns)
nr_of_plots_per_row = 4
nr_of_rows = math.ceil(nr_of_plots / nr_of_plots_per_row)

# Defining the Matplotlib Figure and Axes
figure, axes = plt.subplots(nrows=nr_of_rows, ncols=nr_of_plots_per_row, figsize=(25, 50))
figure.subplots_adjust(hspace=0.5)

# Plotting on the Axes
i, j = 0, 0
for column_name in ames_train_categorical:
    if ames_train_categorical[column_name].nunique() <= 30:
        axes[i][j].set_title(column_name)
        ames_train_categorical[column_name].value_counts().plot(kind='bar', ax=axes[i][j])
        j += 1
        if j % nr_of_plots_per_row == 0:
            i += 1
            j = 0

# Cleaning up unused Axes
# plt.subplots creates a square grid of Axes. On the last row, not all Axes will always be used. Unused Axes are removed here.
axes_flattened = axes.flatten()
for ax in axes_flattened:
    if not ax.has_data():
        ax.remove()

编辑:替代想法
使用Pyplot/State-Machine WOW,您可以使用非常有限的代码行为这样做。但这也有一个缺点,每个图都可以得到自己的数字,它们的排列不正确。

for column_name in ames_train_categorical:
    ames_train_categorical[column_name].value_counts().plot(kind='bar')
    plt.show()

所需的输出

I'm trying to draw bar-charts with counts of unique values for all columns in a Pandas DataFrame. Kind of what df.hist() does for numerical columns, but I have categorical columns.

  • I'd prefer to use the object-oriented approach, because if feels more natural and explicit to me.
  • I'd like to have multiple Axes (subplots) within a single Figure, in a grid fashion (again like what df.hist() does).

My solution below does exactly what I want, but it feels cumbersome. I doubt whether I really need the direct dependency on Matplotlib (and all the code for creating the Figure, removing the unused Axes etc.). I see that pandas.Series.plot has parameters subplots and layout which seem to point to what I want, but maybe I'm totally off here. I tried looping over the columns in my DataFrame and apply these parameters, but I cannot figure it out.

Does anyone know a more compact way to do what I'm trying to achieve?

# Defining the grid-dimensions of the Axes in the Matplotlib Figure
nr_of_plots = len(ames_train_categorical.columns)
nr_of_plots_per_row = 4
nr_of_rows = math.ceil(nr_of_plots / nr_of_plots_per_row)

# Defining the Matplotlib Figure and Axes
figure, axes = plt.subplots(nrows=nr_of_rows, ncols=nr_of_plots_per_row, figsize=(25, 50))
figure.subplots_adjust(hspace=0.5)

# Plotting on the Axes
i, j = 0, 0
for column_name in ames_train_categorical:
    if ames_train_categorical[column_name].nunique() <= 30:
        axes[i][j].set_title(column_name)
        ames_train_categorical[column_name].value_counts().plot(kind='bar', ax=axes[i][j])
        j += 1
        if j % nr_of_plots_per_row == 0:
            i += 1
            j = 0

# Cleaning up unused Axes
# plt.subplots creates a square grid of Axes. On the last row, not all Axes will always be used. Unused Axes are removed here.
axes_flattened = axes.flatten()
for ax in axes_flattened:
    if not ax.has_data():
        ax.remove()

Edit: alternative idea
Using the pyplot/state-machine WoW, you could do it like this with very limited lines of code. But this also has the downside that every graph gets it's own figure, you they're not nicely arranged in a grid.

for column_name in ames_train_categorical:
    ames_train_categorical[column_name].value_counts().plot(kind='bar')
    plt.show()

Desired output
desired output

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

知你几分 2025-02-13 02:27:02

使用以下玩具数据框:

import pandas as pd

df = pd.DataFrame(
    {
        "MS Zoning": ["RL", "FV", "RL", "RH", "RL", "RL"],
        "Street": ["Pave", "Pave", "Pave", "Grvl", "Pave", "Pave"],
        "Alley": ["Grvl", "Grvl", "Grvl", "Grvl", "Pave", "Pave"],
        "Utilities": ["AllPub", "NoSewr", "AllPub", "AllPub", "NoSewr", "AllPub"],
        "Land Slope": ["Gtl", "Mod", "Sev", "Mod", "Sev", "Sev"],
    }
)

这是做到这一点的一些惯用方法:

import math
from matplotlib import pyplot as plt

size = math.ceil(df.shape[1]** (1/2))
fig = plt.figure()

for i, col in enumerate(df.columns):
    fig.add_subplot(size, size, i + 1)
    df[col].value_counts().plot(kind="bar", ax=plt.gca(), title=col, rot=0)

fig.tight_layout()

”在此处输入图像描述”

With the following toy dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        "MS Zoning": ["RL", "FV", "RL", "RH", "RL", "RL"],
        "Street": ["Pave", "Pave", "Pave", "Grvl", "Pave", "Pave"],
        "Alley": ["Grvl", "Grvl", "Grvl", "Grvl", "Pave", "Pave"],
        "Utilities": ["AllPub", "NoSewr", "AllPub", "AllPub", "NoSewr", "AllPub"],
        "Land Slope": ["Gtl", "Mod", "Sev", "Mod", "Sev", "Sev"],
    }
)

Here is a bit more idiomatic way to do it:

import math
from matplotlib import pyplot as plt

size = math.ceil(df.shape[1]** (1/2))
fig = plt.figure()

for i, col in enumerate(df.columns):
    fig.add_subplot(size, size, i + 1)
    df[col].value_counts().plot(kind="bar", ax=plt.gca(), title=col, rot=0)

fig.tight_layout()

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文