熊猫：十年来的小组年

发布于 2025-01-21 18:14:08 字数 2131 浏览 2 评论 0原文

因此，我在CSV中有数据。这是我的代码。

data = pd.read_csv('cast.csv')
data = pd.DataFrame(data)
print(data)

结果看起来像这样。

                          title  year                        name     type  \
0                Closet Monster  2015                    Buffy #1    actor   
1               Suuri illusioni  1985                      Homo $    actor   
2           Battle of the Sexes  2017                     $hutter    actor   
3          Secret in Their Eyes  2015                     $hutter    actor   
4                    Steve Jobs  2015                     $hutter    actor   
...                         ...   ...                         ...      ...   
74996  Mia fora kai ena... moro  2011     Penelope Anastasopoulou  actress   
74997         The Magician King  2004       Tiannah Anastassiades  actress   
74998        Festival of Lights  2010             Zoe Anastassiou  actress   
74999                Toxic Tutu  2016             Zoe Anastassiou  actress   
75000           Fugitive Pieces  2007  Anastassia Anastassopoulou  actress   

                     character     n  
0                      Buffy 4  31.0  
1                       Guests  22.0  
2              Bobby Riggs Fan  10.0  
3              2002 Dodger Fan   NaN  
4      1988 Opera House Patron   NaN  
...                        ...   ...  
74996       Popi voulkanizater  11.0  
74997  Unicycle Race Attendant   NaN  
74998       Guidance Counselor  20.0  
74999        Demon of Toxicity   NaN  
75000             Laundry Girl  25.0  

[75001 rows x 6 columns]

我想按年份和类型按数据进行分组。然后，我想知道特定年份的每种类型的大小。所以这是我的代码。

grouped = data.groupby(['year', 'type']).size()
print(grouped)

结果看起来像这样。

year  type   
1912  actor       1
      actress     2
1913  actor       9
      actress     1
1914  actor      38
                 ..
2019  actress     3
2020  actor       3
      actress     1
2023  actor       1
      actress     2
Length: 220, dtype: int64

问题是，如果我想从1910年到2020年获得尺寸数据，而增加年份为10（每十年）。因此，年度指数将1910年，1920年，1930年，1940年，依此类推，直到2020年。

原文

So I have data in CSV. Here is my code.

data = pd.read_csv('cast.csv')
data = pd.DataFrame(data)
print(data)

The result looks like this.

                          title  year                        name     type  \
0                Closet Monster  2015                    Buffy #1    actor   
1               Suuri illusioni  1985                      Homo $    actor   
2           Battle of the Sexes  2017                     $hutter    actor   
3          Secret in Their Eyes  2015                     $hutter    actor   
4                    Steve Jobs  2015                     $hutter    actor   
...                         ...   ...                         ...      ...   
74996  Mia fora kai ena... moro  2011     Penelope Anastasopoulou  actress   
74997         The Magician King  2004       Tiannah Anastassiades  actress   
74998        Festival of Lights  2010             Zoe Anastassiou  actress   
74999                Toxic Tutu  2016             Zoe Anastassiou  actress   
75000           Fugitive Pieces  2007  Anastassia Anastassopoulou  actress   

                     character     n  
0                      Buffy 4  31.0  
1                       Guests  22.0  
2              Bobby Riggs Fan  10.0  
3              2002 Dodger Fan   NaN  
4      1988 Opera House Patron   NaN  
...                        ...   ...  
74996       Popi voulkanizater  11.0  
74997  Unicycle Race Attendant   NaN  
74998       Guidance Counselor  20.0  
74999        Demon of Toxicity   NaN  
75000             Laundry Girl  25.0  

[75001 rows x 6 columns]

I want to group the data by year and type. Then I want to know the size of the each type on specific year. So here is my code.

grouped = data.groupby(['year', 'type']).size()
print(grouped)

The result look like this.

year  type   
1912  actor       1
      actress     2
1913  actor       9
      actress     1
1914  actor      38
                 ..
2019  actress     3
2020  actor       3
      actress     1
2023  actor       1
      actress     2
Length: 220, dtype: int64

The problem is, how if I want to get the size data from 1910 until 2020 and the increase year is 10 (Per decade). So the year index will 1910, 1920, 1930, 1940, and so on until 2020.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

草莓酥 2025-01-28 18:14:08

我看到了两个简单的选择。

1-将年份归于下层10：

group = df['year']//10*10  # or df['year'].round(-1)
grouped = data.groupby([group, 'type']).size()

2-使用 pandas.cut ：

years = list(range(1910,2031,10))
group = pd.cut(s, bins=years, labels=years[:-1])
grouped = data.groupby([group, 'type']).size()

I see two simple options.

1- round the years to the lower 10:

group = df['year']//10*10  # or df['year'].round(-1)
grouped = data.groupby([group, 'type']).size()

2- use pandas.cut:

years = list(range(1910,2031,10))
group = pd.cut(s, bins=years, labels=years[:-1])
grouped = data.groupby([group, 'type']).size()

回复收藏 0 原文

~没有更多了~

关于作者

涙—继续流

暂无简介

文章

27 人气

关注发私信

lylex099819

文章 0 评论 0

关注

yg

文章 0 评论 0

关注

mb_PT8LkUS5

文章 0 评论 0

关注

埋情葬爱

文章 0 评论 0

关注

佚名

文章 0 评论 0

关注

奢望

文章 0 评论 0

友情链接

文江博客

熊猫：十年来的小组年

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

lylex099819

yg

mb_PT8LkUS5

埋情葬爱

佚名

奢望

友情链接

熊猫：十年来的小组年

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

lylex099819

yg

mb_PT8LkUS5

埋情葬爱

佚名

奢望

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。