试图了解何时创建组对象以及创建非组对象

发布于 2025-02-04 15:39:47 字数 1767 浏览 1 评论 0原文

我有一个数据框“ DFT”,其中包含各种Netflix电视节目和电影的详细信息,从中,我从中提取了该国在印度或西班牙的位置。然后按国家对此子集进行分组,然后我提取列“ Listed_in”,其中包含每行电视节目/电影的流派类别。

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in']

现在,这是一个组对象:

pandas.core.groupby.generic.SeriesGroupBy object at 0x7fce4b821ac0

现在,我检查其中有多少类是“纪录片”作为一种类型的类别。

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
    lambda x: x.str.contains('Documentaries'))
Out[8]: 
4      False
24     False
39     False
50     False
66     False
69     False
105    False
109    False
114    False
116    False
Name: listed_in, dtype: bool

现在,这是一个例行的非群体系列,它只是整个“ listed_in”列中的布尔结果列表,而无需按国家进行切片。

pandas.core.series.Series

但是,如果我要求value_counts()或 sum()在应用功能中,则结果的显示方式不同。

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
    lambda x: x.str.contains('Documentaries').sum())
country
India    19
Spain    17
Name: listed_in, dtype: int64

现在,这也被显示为一个非群体系列,但是我想知道为什么只应用string.contains()过滤器提供了一个无国家差异化的系列。如果我在应用程序括号之外添加.sum()

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
    lambda x: x.str.contains('Documentaries')).sum()

我得到36,是印度和西班牙的真实值总数。但是,当我应用sum()value_counts()在应用程序括号内,我得到的结果是在印度(19)和西班牙(17)分开的结果,如上所示。

我试图理解为什么这样。为什么不只是应用str.Contains()产生由国家/地区区分的一系列布尔值?如果没有,则添加``'sum()````'''''''如果现在证明我的车站上方,我至少想了解何时发生这种情况,以便我可以牢记这一点进行数据分析。

I have a dataframe calles 'dft' containing details of various Netflix TV shows and Movies, from which I extract that subset where the country is either India or Spain. This subset is then grouped by country, and I extract the column "listed_in", which contains genre categories for the TV show/ Movie of each row.

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in']

Now this is a groupby object:

pandas.core.groupby.generic.SeriesGroupBy object at 0x7fce4b821ac0

Now, I check how many of these have the category "Documentary" as one of the genres.

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
    lambda x: x.str.contains('Documentaries'))
Out[8]: 
4      False
24     False
39     False
50     False
66     False
69     False
105    False
109    False
114    False
116    False
Name: listed_in, dtype: bool

Now this is a routine non-groupby series where its just a list of boolean results on the whole "listed_in" column without slicing by country.

pandas.core.series.Series

But then if I ask for value_counts() or sum() inside the apply function, the result is presented differently.

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
    lambda x: x.str.contains('Documentaries').sum())
country
India    19
Spain    17
Name: listed_in, dtype: int64

Now this is also being shown as a non-groupby Series, however I am wondering why is it that just applying the string.contains() filter gives a series presented without country differentiation. If I add a .sum() outside the apply function bracket like this:

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
    lambda x: x.str.contains('Documentaries')).sum()

I get 36, the total number of True values across both India and Spain. But when I apply sum() or value_counts() inside the apply bracket I get a result that is separated for India (19) and Spain (17) as shown above.

I am trying to understand why this is so. Why doesn't just applying the str.contains() produce a series of boolean value differentiated by country? And if it doesn't, what difference does adding the ````sum()``` make. Should that prove above my station right now, I at least want to understand when this happens so I can keep that in mind for data analysis.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文