使用 Grouper 按日期时间频率对行进行分组并绘制另一列的计数(带有复制代码)
为了简单起见,假设我有一个以下安排的数据框:
import numpy as np
import pandas as pd
def random_dates(start, end, n, unit='D', seed=None):
if not seed:
np.random.seed(0)
ndays = (end - start).days + 1
return pd.to_timedelta(np.random.rand(n) * ndays, unit=unit) + start
np.random.seed(0)
start = pd.to_datetime('2015-01-01')
end = pd.to_datetime('2018-01-01')
date = random_dates(start, end, 1000)
#%%
gender = np.random.randint(0,2,(1000,))
DF = pd.DataFrame({'datetime_of_call':date,'gender_of_caller':gender})
我想将男性和女性呼叫者的分布绘制到某条线上,作为总计年/月/小时的函数(另外,我们现在可以说只是月)
例如,我想直观地看到无论年份,特别是在一月份,女性来电者的比例很高。另一个例子是,如果我希望频率为每小时,那么我想仅按小时了解所有年份中男性/女性的分布情况。
我使用石斑鱼根据月份进行分组:
DF.groupby(pd.Grouper(key='datetime_of_call',freq='M'))
现在不知道如何继续,我尝试了以下操作:
pd.crosstab(DF.groupby(pd.Grouper(key='datetime_of_call',freq='M')),DF.gender_of_caller).plot.bar(stacked=True)
但出现错误
ValueError: Shape of passed values is (37, 2), indices imply (1000, 2)
for simplicity, lets say i have a dataframe the following arrangement:
import numpy as np
import pandas as pd
def random_dates(start, end, n, unit='D', seed=None):
if not seed:
np.random.seed(0)
ndays = (end - start).days + 1
return pd.to_timedelta(np.random.rand(n) * ndays, unit=unit) + start
np.random.seed(0)
start = pd.to_datetime('2015-01-01')
end = pd.to_datetime('2018-01-01')
date = random_dates(start, end, 1000)
#%%
gender = np.random.randint(0,2,(1000,))
DF = pd.DataFrame({'datetime_of_call':date,'gender_of_caller':gender})
i want to plot the distribution of male and female callers to some line, as a function of year/month/hour ON TOTAL (separately, we can say just month for now)
for example, i want to see visually that irrespective of year, specifically on January there is a high fraction of female callers. another example would be if i want the frequency to be per hour, so i want to know the distribution of male/female throughout all years only by the hour.
i used grouper to group according to month:
DF.groupby(pd.Grouper(key='datetime_of_call',freq='M'))
now not sure how to continue, i tried the following:
pd.crosstab(DF.groupby(pd.Grouper(key='datetime_of_call',freq='M')),DF.gender_of_caller).plot.bar(stacked=True)
but got an error
ValueError: Shape of passed values is (37, 2), indices imply (1000, 2)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为你可以通过
groupby
来实现这一点,并且to_period
会为你提供一个数据帧,例如:
然后你可以使用
然后您可以将其绘制为条形图或其他内容:
编辑:如果你想要无论年份如何,group by 都是月份,您可以将
groupby
行更改为:现在最终的图表将如下所示:
I think you can achieve this with
groupby
andto_period
gets you a dataframe like:
Then you can take its transpose with
And then you can plot this as a bar chart or something:
EDIT: If you want the group by to be the month regardless of the year you can change the
groupby
line to:Now the final graph will look like: