使用 Grouper 按日期时间频率对行进行分组并绘制另一列的计数（带有复制代码）

发布于 2025-01-16 08:02:06 字数 1084 浏览 1 评论 0原文

为了简单起见，假设我有一个以下安排的数据框：

import numpy as np
import pandas as pd
def random_dates(start, end, n, unit='D', seed=None):
    if not seed:  
        np.random.seed(0)

    ndays = (end - start).days + 1
    return pd.to_timedelta(np.random.rand(n) * ndays, unit=unit) + start
np.random.seed(0)
start = pd.to_datetime('2015-01-01')
end = pd.to_datetime('2018-01-01')
date = random_dates(start, end, 1000)
#%%
gender = np.random.randint(0,2,(1000,))
DF = pd.DataFrame({'datetime_of_call':date,'gender_of_caller':gender})

我想将男性和女性呼叫者的分布绘制到某条线上，作为总计年/月/小时的函数（另外，我们现在可以说只是月）

例如，我想直观地看到无论年份，特别是在一月份，女性来电者的比例很高。另一个例子是，如果我希望频率为每小时，那么我想仅按小时了解所有年份中男性/女性的分布情况。

我使用石斑鱼根据月份进行分组：

DF.groupby(pd.Grouper(key='datetime_of_call',freq='M'))

现在不知道如何继续，我尝试了以下操作：

pd.crosstab(DF.groupby(pd.Grouper(key='datetime_of_call',freq='M')),DF.gender_of_caller).plot.bar(stacked=True)

但出现错误

ValueError: Shape of passed values is (37, 2), indices imply (1000, 2)

原文

for simplicity, lets say i have a dataframe the following arrangement:

import numpy as np
import pandas as pd
def random_dates(start, end, n, unit='D', seed=None):
    if not seed:  
        np.random.seed(0)

    ndays = (end - start).days + 1
    return pd.to_timedelta(np.random.rand(n) * ndays, unit=unit) + start
np.random.seed(0)
start = pd.to_datetime('2015-01-01')
end = pd.to_datetime('2018-01-01')
date = random_dates(start, end, 1000)
#%%
gender = np.random.randint(0,2,(1000,))
DF = pd.DataFrame({'datetime_of_call':date,'gender_of_caller':gender})

i want to plot the distribution of male and female callers to some line, as a function of year/month/hour ON TOTAL (separately, we can say just month for now)

for example, i want to see visually that irrespective of year, specifically on January there is a high fraction of female callers. another example would be if i want the frequency to be per hour, so i want to know the distribution of male/female throughout all years only by the hour.

i used grouper to group according to month:

DF.groupby(pd.Grouper(key='datetime_of_call',freq='M'))

now not sure how to continue, i tried the following:

pd.crosstab(DF.groupby(pd.Grouper(key='datetime_of_call',freq='M')),DF.gender_of_caller).plot.bar(stacked=True)

but got an error

ValueError: Shape of passed values is (37, 2), indices imply (1000, 2)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

阪姬 2025-01-23 08:02:06

我认为你可以通过 groupby 来实现这一点，并且 to_period

gb = DF.groupby(['gender_of_caller', DF.datetime_of_call.dt.to_period('M')]).size()
df = gb.unstack()

会为你提供一个数据帧，例如：

然后你可以使用

df = df.T

然后您可以将其绘制为条形图或其他内容：

df.plot(kind='bar')

编辑：如果你想要无论年份如何，group by 都是月份，您可以将 groupby 行更改为：

gb = DF.groupby(['gender_of_caller', DF.datetime_of_call.dt.month]).size()

现在最终的图表将如下所示：

I think you can achieve this with groupby and to_period

gb = DF.groupby(['gender_of_caller', DF.datetime_of_call.dt.to_period('M')]).size()
df = gb.unstack()

gets you a dataframe like:

Then you can take its transpose with

df = df.T

And then you can plot this as a bar chart or something:

df.plot(kind='bar')

EDIT: If you want the group by to be the month regardless of the year you can change the groupby line to:

gb = DF.groupby(['gender_of_caller', DF.datetime_of_call.dt.month]).size()

Now the final graph will look like:

回复收藏 0 原文

~没有更多了~

关于作者

诗笺

暂无简介

文章

24 人气

关注发私信

友情链接

文江博客

使用 Grouper 按日期时间频率对行进行分组并绘制另一列的计数（带有复制代码）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

琉璃梦幻

qq_4zWU6L

话少情深

西西弗的石头怪

彻夜缠绵

千寻…

友情链接

使用 Grouper 按日期时间频率对行进行分组并绘制另一列的计数（带有复制代码）

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

琉璃梦幻

qq_4zWU6L

话少情深

西西弗的石头怪

彻夜缠绵

千寻…

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。