试图了解何时创建组对象以及创建非组对象

发布于 2025-02-04 15:39:47 字数 1767 浏览 1 评论 0原文

我有一个数据框“ DFT”，其中包含各种Netflix电视节目和电影的详细信息，从中，我从中提取了该国在印度或西班牙的位置。然后按国家对此子集进行分组，然后我提取列“ Listed_in”，其中包含每行电视节目/电影的流派类别。

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in']

现在，这是一个组对象：

pandas.core.groupby.generic.SeriesGroupBy object at 0x7fce4b821ac0

现在，我检查其中有多少类是“纪录片”作为一种类型的类别。

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
    lambda x: x.str.contains('Documentaries'))

Out[8]: 
4      False
24     False
39     False
50     False
66     False
69     False
105    False
109    False
114    False
116    False
Name: listed_in, dtype: bool

现在，这是一个例行的非群体系列，它只是整个“ listed_in”列中的布尔结果列表，而无需按国家进行切片。

pandas.core.series.Series

但是，如果我要求value_counts（）或 sum（）在应用功能中，则结果的显示方式不同。

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
    lambda x: x.str.contains('Documentaries').sum())

country
India    19
Spain    17
Name: listed_in, dtype: int64

现在，这也被显示为一个非群体系列，但是我想知道为什么只应用string.contains（）过滤器提供了一个无国家差异化的系列。如果我在应用程序括号之外添加.sum（）：

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
    lambda x: x.str.contains('Documentaries')).sum()

我得到36，是印度和西班牙的真实值总数。但是，当我应用sum（）或value_counts（）在应用程序括号内，我得到的结果是在印度（19）和西班牙（17）分开的结果，如上所示。

我试图理解为什么这样。为什么不只是应用str.Contains（）产生由国家/地区区分的一系列布尔值？如果没有，则添加``'sum（）````'''''''如果现在证明我的车站上方，我至少想了解何时发生这种情况，以便我可以牢记这一点进行数据分析。

原文

I have a dataframe calles 'dft' containing details of various Netflix TV shows and Movies, from which I extract that subset where the country is either India or Spain. This subset is then grouped by country, and I extract the column "listed_in", which contains genre categories for the TV show/ Movie of each row.

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in']

Now this is a groupby object:

pandas.core.groupby.generic.SeriesGroupBy object at 0x7fce4b821ac0

Now, I check how many of these have the category "Documentary" as one of the genres.

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
    lambda x: x.str.contains('Documentaries'))

Out[8]: 
4      False
24     False
39     False
50     False
66     False
69     False
105    False
109    False
114    False
116    False
Name: listed_in, dtype: bool

Now this is a routine non-groupby series where its just a list of boolean results on the whole "listed_in" column without slicing by country.

pandas.core.series.Series

But then if I ask for value_counts() or sum() inside the apply function, the result is presented differently.

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
    lambda x: x.str.contains('Documentaries').sum())

country
India    19
Spain    17
Name: listed_in, dtype: int64

Now this is also being shown as a non-groupby Series, however I am wondering why is it that just applying the string.contains() filter gives a series presented without country differentiation. If I add a .sum() outside the apply function bracket like this:

dft[(dft['country']=='India') | (dft['country'] == 'Spain')].groupby('country')['listed_in'].apply(
    lambda x: x.str.contains('Documentaries')).sum()

I get 36, the total number of True values across both India and Spain. But when I apply sum() or value_counts() inside the apply bracket I get a result that is separated for India (19) and Spain (17) as shown above.

I am trying to understand why this is so. Why doesn't just applying the str.contains() produce a series of boolean value differentiated by country? And if it doesn't, what difference does adding the ````sum()``` make. Should that prove above my station right now, I at least want to understand when this happens so I can keep that in mind for data analysis.

分享到QQ

分享到微博