试图了解何时创建组对象以及创建非组对象
我有一个数据框“ DFT”,其中包含各种Netflix电视节目和电影的详细信息,从中,我从中提取了该国在印度或西班牙的位置。然后按国家对此子集进行分组,然后我提取列“ Listed_in”,其中包含每行电视节目/电影的流派类别。
dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in']
现在,这是一个组对象:
pandas.core.groupby.generic.SeriesGroupBy object at 0x7fce4b821ac0
现在,我检查其中有多少类是“纪录片”作为一种类型的类别。
dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
lambda x: x.str.contains('Documentaries'))
Out[8]:
4 False
24 False
39 False
50 False
66 False
69 False
105 False
109 False
114 False
116 False
Name: listed_in, dtype: bool
现在,这是一个例行的非群体系列,它只是整个“ listed_in”列中的布尔结果列表,而无需按国家进行切片。
pandas.core.series.Series
但是,如果我要求value_counts()
或 sum()在应用功能中,则结果的显示方式不同。
dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
lambda x: x.str.contains('Documentaries').sum())
country
India 19
Spain 17
Name: listed_in, dtype: int64
现在,这也被显示为一个非群体系列,但是我想知道为什么只应用string.contains()
过滤器提供了一个无国家差异化的系列。如果我在应用程序括号之外添加.sum()
:
dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
lambda x: x.str.contains('Documentaries')).sum()
我得到36,是印度和西班牙的真实值总数。但是,当我应用sum()
或value_counts()
在应用程序括号内,我得到的结果是在印度(19)和西班牙(17)分开的结果,如上所示。
我试图理解为什么这样。为什么不只是应用str.Contains()
产生由国家/地区区分的一系列布尔值?如果没有,则添加``'sum()````'''''''如果现在证明我的车站上方,我至少想了解何时发生这种情况,以便我可以牢记这一点进行数据分析。
I have a dataframe calles 'dft' containing details of various Netflix TV shows and Movies, from which I extract that subset where the country is either India or Spain. This subset is then grouped by country, and I extract the column "listed_in", which contains genre categories for the TV show/ Movie of each row.
dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in']
Now this is a groupby object:
pandas.core.groupby.generic.SeriesGroupBy object at 0x7fce4b821ac0
Now, I check how many of these have the category "Documentary" as one of the genres.
dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
lambda x: x.str.contains('Documentaries'))
Out[8]:
4 False
24 False
39 False
50 False
66 False
69 False
105 False
109 False
114 False
116 False
Name: listed_in, dtype: bool
Now this is a routine non-groupby series where its just a list of boolean results on the whole "listed_in" column without slicing by country.
pandas.core.series.Series
But then if I ask for value_counts()
or sum()
inside the apply function, the result is presented differently.
dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
lambda x: x.str.contains('Documentaries').sum())
country
India 19
Spain 17
Name: listed_in, dtype: int64
Now this is also being shown as a non-groupby Series, however I am wondering why is it that just applying the string.contains()
filter gives a series presented without country differentiation. If I add a .sum()
outside the apply function bracket like this:
dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
lambda x: x.str.contains('Documentaries')).sum()
I get 36, the total number of True values across both India and Spain. But when I apply sum()
or value_counts()
inside the apply bracket I get a result that is separated for India (19) and Spain (17) as shown above.
I am trying to understand why this is so. Why doesn't just applying the str.contains()
produce a series of boolean value differentiated by country? And if it doesn't, what difference does adding the ````sum()``` make. Should that prove above my station right now, I at least want to understand when this happens so I can keep that in mind for data analysis.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论