熊猫:十年来的小组年

发布于 2025-01-21 18:14:08 字数 2131 浏览 0 评论 0原文

因此,我在CSV中有数据。这是我的代码。

data = pd.read_csv('cast.csv')
data = pd.DataFrame(data)
print(data)

结果看起来像这样。

                          title  year                        name     type  \
0                Closet Monster  2015                    Buffy #1    actor   
1               Suuri illusioni  1985                      Homo $    actor   
2           Battle of the Sexes  2017                     $hutter    actor   
3          Secret in Their Eyes  2015                     $hutter    actor   
4                    Steve Jobs  2015                     $hutter    actor   
...                         ...   ...                         ...      ...   
74996  Mia fora kai ena... moro  2011     Penelope Anastasopoulou  actress   
74997         The Magician King  2004       Tiannah Anastassiades  actress   
74998        Festival of Lights  2010             Zoe Anastassiou  actress   
74999                Toxic Tutu  2016             Zoe Anastassiou  actress   
75000           Fugitive Pieces  2007  Anastassia Anastassopoulou  actress   

                     character     n  
0                      Buffy 4  31.0  
1                       Guests  22.0  
2              Bobby Riggs Fan  10.0  
3              2002 Dodger Fan   NaN  
4      1988 Opera House Patron   NaN  
...                        ...   ...  
74996       Popi voulkanizater  11.0  
74997  Unicycle Race Attendant   NaN  
74998       Guidance Counselor  20.0  
74999        Demon of Toxicity   NaN  
75000             Laundry Girl  25.0  

[75001 rows x 6 columns]

我想按年份和类型按数据进行分组。然后,我想知道特定年份的每种类型的大小。所以这是我的代码。

grouped = data.groupby(['year', 'type']).size()
print(grouped)

结果看起来像这样。

year  type   
1912  actor       1
      actress     2
1913  actor       9
      actress     1
1914  actor      38
                 ..
2019  actress     3
2020  actor       3
      actress     1
2023  actor       1
      actress     2
Length: 220, dtype: int64

问题是,如果我想从1910年到2020年获得尺寸数据,而增加年份为10(每十年)。因此,年度指数将1910年,1920年,1930年,1940年,依此类推,直到2020年。

So I have data in CSV. Here is my code.

data = pd.read_csv('cast.csv')
data = pd.DataFrame(data)
print(data)

The result looks like this.

                          title  year                        name     type  \
0                Closet Monster  2015                    Buffy #1    actor   
1               Suuri illusioni  1985                      Homo $    actor   
2           Battle of the Sexes  2017                     $hutter    actor   
3          Secret in Their Eyes  2015                     $hutter    actor   
4                    Steve Jobs  2015                     $hutter    actor   
...                         ...   ...                         ...      ...   
74996  Mia fora kai ena... moro  2011     Penelope Anastasopoulou  actress   
74997         The Magician King  2004       Tiannah Anastassiades  actress   
74998        Festival of Lights  2010             Zoe Anastassiou  actress   
74999                Toxic Tutu  2016             Zoe Anastassiou  actress   
75000           Fugitive Pieces  2007  Anastassia Anastassopoulou  actress   

                     character     n  
0                      Buffy 4  31.0  
1                       Guests  22.0  
2              Bobby Riggs Fan  10.0  
3              2002 Dodger Fan   NaN  
4      1988 Opera House Patron   NaN  
...                        ...   ...  
74996       Popi voulkanizater  11.0  
74997  Unicycle Race Attendant   NaN  
74998       Guidance Counselor  20.0  
74999        Demon of Toxicity   NaN  
75000             Laundry Girl  25.0  

[75001 rows x 6 columns]

I want to group the data by year and type. Then I want to know the size of the each type on specific year. So here is my code.

grouped = data.groupby(['year', 'type']).size()
print(grouped)

The result look like this.

year  type   
1912  actor       1
      actress     2
1913  actor       9
      actress     1
1914  actor      38
                 ..
2019  actress     3
2020  actor       3
      actress     1
2023  actor       1
      actress     2
Length: 220, dtype: int64

The problem is, how if I want to get the size data from 1910 until 2020 and the increase year is 10 (Per decade). So the year index will 1910, 1920, 1930, 1940, and so on until 2020.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

草莓酥 2025-01-28 18:14:08

我看到了两个简单的选择。

1-将年份归于下层10:

group = df['year']//10*10  # or df['year'].round(-1)
grouped = data.groupby([group, 'type']).size()

2-使用 pandas.cut

years = list(range(1910,2031,10))
group = pd.cut(s, bins=years, labels=years[:-1])
grouped = data.groupby([group, 'type']).size()

I see two simple options.

1- round the years to the lower 10:

group = df['year']//10*10  # or df['year'].round(-1)
grouped = data.groupby([group, 'type']).size()

2- use pandas.cut:

years = list(range(1910,2031,10))
group = pd.cut(s, bins=years, labels=years[:-1])
grouped = data.groupby([group, 'type']).size()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文