使用Groupby函数绘制条形图,并绘制和简化

发布于 2025-02-10 03:13:33 字数 834 浏览 0 评论 0原文

我正在尝试根据GroupBy函数绘制条形图,但是一旦尝试它崩溃并显示以下错误:

当用户从MultiSelect widget中选择3个项目时,以下错误出现。

valueerror:所有参数均应具有相同的长度。长度 参数颜色是3,而先前正在处理的长度 参数['性别','count']是95

代码:

some_columns_df = df.loc[:,['gender','country','city','hoby','company','status']]
some_collumns = some_columns_df.columns.tolist()

select_box_var= st.selectbox("Choose X Column",some_collumns)
multiselect_var= st.multiselect("Select Columns To GroupBy",some_collumns)  

test_g3 = df.groupby([select_box_var] + multiselect_var).size().reset_index(name='count')
fig = px.histogram(test_g3,x=select_box_var, y='count',color=multiselect_var ,barmode = 'group',text_auto = True)

            

我知道错误在color px中的参数 px.histogram.histogram

i am trying to plot a bar chart based on groupby function but once i try it crash and display the below error:

this error below appear when the user select 3 items from the multiselect widget.

ValueError: All arguments should have the same length. The length of
argument color is 3, whereas the length of previously-processed
arguments ['gender', 'count'] is 95

code:

some_columns_df = df.loc[:,['gender','country','city','hoby','company','status']]
some_collumns = some_columns_df.columns.tolist()

select_box_var= st.selectbox("Choose X Column",some_collumns)
multiselect_var= st.multiselect("Select Columns To GroupBy",some_collumns)  

test_g3 = df.groupby([select_box_var] + multiselect_var).size().reset_index(name='count')
fig = px.histogram(test_g3,x=select_box_var, y='count',color=multiselect_var ,barmode = 'group',text_auto = True)

            

I know the error is in the color parameter in the px.histogram

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

后知后觉 2025-02-17 03:13:33

原因是颜色只接受一个类别。

color=['column_a','column_b']

会导致

valueerror:所有参数均应具有相同的长度。参数长度color为2

2是List 的长度['Column_a'' ,'column_b'],而244dataframe的行。

根据 document

color (str或int或类似数组) - data_frame中的列的名称,或者pandas系列或array_like对象。该列或array_like的值用于将颜色分配给标记。

因此,要么我们使用column_name,要么使用series
这是我的方法:

import plotly.express as px
df = px.data.tips() # a data set from plotly
df.head()

输出

  total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4

列:
sex唯一值<代码>女性和男性

time带独特的值晚餐午餐
我选择这两列,更容易弄清楚只有
4组合。

我们创建一个系列 contat列sextime

categories = df[['sex','time']].agg(', '.join, axis=1)
print(categories)

输出

0      Female, Dinner
1        Male, Dinner
2        Male, Dinner
3        Male, Dinner
4      Female, Dinner
            ...      
239      Male, Dinner
240    Female, Dinner
241      Male, Dinner
242      Male, Dinner
243    Female, Dinner
Length: 244, dtype: object

使用此类别作为颜色参考

fig = px.histogram(df, x="total_bill", color =categories)
fig.show()


如果','。加入没有工作,有问题,

categories = df[['sex','time']].agg(', '.join, axis=1)

那么我们尝试另一种方式

categories = df['sex'] + df['time']

sup [1]

“在此处输入图像描述”

The reason is color only accepts one category.

color=['column_a','column_b']

Would cause

ValueError: All arguments should have the same length. The length of argument color is 2, whereas the length of previously-processed arguments ['total_bill'] is 244

2 is the length of list ['column_a','column_b'], while 244 is the dataframe's rows.

According to the document:

color (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign color to marks.

Therefore, either we use a column_name, or we use a series.
Here's my approach:

import plotly.express as px
df = px.data.tips() # a data set from plotly
df.head()

Output

  total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4

Column:
sex with unique values Female and Male

time with unique values Dinner and Lunch
I choose these two columns, it's easier to figure out that there is only
4 combination.

We create a series that concat columns sex and time

categories = df[['sex','time']].agg(', '.join, axis=1)
print(categories)

Output

0      Female, Dinner
1        Male, Dinner
2        Male, Dinner
3        Male, Dinner
4      Female, Dinner
            ...      
239      Male, Dinner
240    Female, Dinner
241      Male, Dinner
242      Male, Dinner
243    Female, Dinner
Length: 244, dtype: object

Utilize this categories as color reference

fig = px.histogram(df, x="total_bill", color =categories)
fig.show()

enter image description here


If ','.join didn't work, having issue,

categories = df[['sex','time']].agg(', '.join, axis=1)

then we try another way

categories = df['sex'] + df['time']

Sup[1]

enter image description hereenter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文