用value_counts()绘制pandas dataframe中多列的bar-charts()
我正在尝试绘制带有熊猫数据框中所有列的唯一值的栏目。 df.hist()
对数值列的类型,但我有分类列。
- 我更喜欢使用面向对象的方法,因为如果对我来说更自然和明确。
- 我想以网格方式(再次像
df.hist()
die)中有多个轴(子图)。
下面我的解决方案确实做了我想要的,但感觉很麻烦。我怀疑我是否真的需要直接依赖Matplotlib(以及创建图形,删除未使用轴等的所有代码)。我看到 pandas.series.plot 子图
和布局
似乎指向我想要的东西,但也许我完全不在这里。我尝试在数据框中循环循环并应用这些参数,但我无法弄清楚。
有人知道一种更紧凑的方法来完成我要实现的目标吗?
# Defining the grid-dimensions of the Axes in the Matplotlib Figure
nr_of_plots = len(ames_train_categorical.columns)
nr_of_plots_per_row = 4
nr_of_rows = math.ceil(nr_of_plots / nr_of_plots_per_row)
# Defining the Matplotlib Figure and Axes
figure, axes = plt.subplots(nrows=nr_of_rows, ncols=nr_of_plots_per_row, figsize=(25, 50))
figure.subplots_adjust(hspace=0.5)
# Plotting on the Axes
i, j = 0, 0
for column_name in ames_train_categorical:
if ames_train_categorical[column_name].nunique() <= 30:
axes[i][j].set_title(column_name)
ames_train_categorical[column_name].value_counts().plot(kind='bar', ax=axes[i][j])
j += 1
if j % nr_of_plots_per_row == 0:
i += 1
j = 0
# Cleaning up unused Axes
# plt.subplots creates a square grid of Axes. On the last row, not all Axes will always be used. Unused Axes are removed here.
axes_flattened = axes.flatten()
for ax in axes_flattened:
if not ax.has_data():
ax.remove()
编辑:替代想法
使用Pyplot/State-Machine WOW,您可以使用非常有限的代码行为这样做。但这也有一个缺点,每个图都可以得到自己的数字,它们的排列不正确。
for column_name in ames_train_categorical:
ames_train_categorical[column_name].value_counts().plot(kind='bar')
plt.show()
I'm trying to draw bar-charts with counts of unique values for all columns in a Pandas DataFrame. Kind of what df.hist()
does for numerical columns, but I have categorical columns.
- I'd prefer to use the object-oriented approach, because if feels more natural and explicit to me.
- I'd like to have multiple Axes (subplots) within a single Figure, in a grid fashion (again like what
df.hist()
does).
My solution below does exactly what I want, but it feels cumbersome. I doubt whether I really need the direct dependency on Matplotlib (and all the code for creating the Figure, removing the unused Axes etc.). I see that pandas.Series.plot has parameters subplots
and layout
which seem to point to what I want, but maybe I'm totally off here. I tried looping over the columns in my DataFrame and apply these parameters, but I cannot figure it out.
Does anyone know a more compact way to do what I'm trying to achieve?
# Defining the grid-dimensions of the Axes in the Matplotlib Figure
nr_of_plots = len(ames_train_categorical.columns)
nr_of_plots_per_row = 4
nr_of_rows = math.ceil(nr_of_plots / nr_of_plots_per_row)
# Defining the Matplotlib Figure and Axes
figure, axes = plt.subplots(nrows=nr_of_rows, ncols=nr_of_plots_per_row, figsize=(25, 50))
figure.subplots_adjust(hspace=0.5)
# Plotting on the Axes
i, j = 0, 0
for column_name in ames_train_categorical:
if ames_train_categorical[column_name].nunique() <= 30:
axes[i][j].set_title(column_name)
ames_train_categorical[column_name].value_counts().plot(kind='bar', ax=axes[i][j])
j += 1
if j % nr_of_plots_per_row == 0:
i += 1
j = 0
# Cleaning up unused Axes
# plt.subplots creates a square grid of Axes. On the last row, not all Axes will always be used. Unused Axes are removed here.
axes_flattened = axes.flatten()
for ax in axes_flattened:
if not ax.has_data():
ax.remove()
Edit: alternative idea
Using the pyplot/state-machine WoW, you could do it like this with very limited lines of code. But this also has the downside that every graph gets it's own figure, you they're not nicely arranged in a grid.
for column_name in ames_train_categorical:
ames_train_categorical[column_name].value_counts().plot(kind='bar')
plt.show()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用以下玩具数据框:
这是做到这一点的一些惯用方法:
With the following toy dataframe:
Here is a bit more idiomatic way to do it: