如何实现数据框中列的最接近值的等级函数?
df.head():
run_time match_datetime country league home_team away_team
0 2021-08-07 00:04:36.326391 2021-08-06 Russia FNL 2 - Group 2 Yenisey 2 Lokomotiv-Kazanka
1 2021-08-07 00:04:36.326391 2021-08-07 Russia Youth League Ural U19 Krylya Sovetov Samara U19
2 2021-08-07 00:04:36.326391 2021-08-08 World Club Friendly Alaves Al Nasr
3 2021-08-07 00:04:36.326391 2021-08-09 China Jia League Chengdu Rongcheng Shenyang Urban FC
4 2021-08-06 00:04:36.326391 2021-08-06 China Super League Wuhan FC Tianjin Jinmen Tiger
5 2021-08-06 00:04:36.326391 2021-08-07 Czech Republic U19 League Sigma Olomouc U19 Karvina U19
6 2021-08-06 00:04:36.326391 2021-08-08 Russia Youth League Konoplev Academy U19 Rubin Kazan U19
7 2021-08-06 00:04:36.326391 2021-08-09 World Club Friendly Real Sociedad Eibar
所需的DF
run_time match_datetime country league home_team away_team
0 2021-08-07 00:04:36.326391 2021-08-06 Russia FNL 2 - Group 2 Yenisey 2 Lokomotiv-Kazanka
1 2021-08-07 00:04:36.326391 2021-08-07 Russia Youth League Ural U19 Krylya Sovetov Samara U19
4 2021-08-06 00:04:36.326391 2021-08-06 China Super League Wuhan FC Tianjin Jinmen Tiger
5 2021-08-06 00:04:36.326391 2021-08-07 Czech Republic U19 League Sigma Olomouc U19 Karvina U19
如何使用等级
函数仅过滤2个最近的match_dateTime
每个run_time
值的日期。 IE所需的数据框将是一个过滤的数据框架,每个match_dateTime
值都将为每个run_time
>
df.head():
run_time match_datetime country league home_team away_team
0 2021-08-07 00:04:36.326391 2021-08-06 Russia FNL 2 - Group 2 Yenisey 2 Lokomotiv-Kazanka
1 2021-08-07 00:04:36.326391 2021-08-07 Russia Youth League Ural U19 Krylya Sovetov Samara U19
2 2021-08-07 00:04:36.326391 2021-08-08 World Club Friendly Alaves Al Nasr
3 2021-08-07 00:04:36.326391 2021-08-09 China Jia League Chengdu Rongcheng Shenyang Urban FC
4 2021-08-06 00:04:36.326391 2021-08-06 China Super League Wuhan FC Tianjin Jinmen Tiger
5 2021-08-06 00:04:36.326391 2021-08-07 Czech Republic U19 League Sigma Olomouc U19 Karvina U19
6 2021-08-06 00:04:36.326391 2021-08-08 Russia Youth League Konoplev Academy U19 Rubin Kazan U19
7 2021-08-06 00:04:36.326391 2021-08-09 World Club Friendly Real Sociedad Eibar
desired df
run_time match_datetime country league home_team away_team
0 2021-08-07 00:04:36.326391 2021-08-06 Russia FNL 2 - Group 2 Yenisey 2 Lokomotiv-Kazanka
1 2021-08-07 00:04:36.326391 2021-08-07 Russia Youth League Ural U19 Krylya Sovetov Samara U19
4 2021-08-06 00:04:36.326391 2021-08-06 China Super League Wuhan FC Tianjin Jinmen Tiger
5 2021-08-06 00:04:36.326391 2021-08-07 Czech Republic U19 League Sigma Olomouc U19 Karvina U19
How do i use rank
function to filter only the 2 nearest match_datetime
dates for every run_time
value.
i.e. desired dataframe will be a filtered dataframe that will have all the nearest 2 match_datetime
values for every run_time
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
更新
使用
等级
而不是head
:输出:
替代
您可以使用:
Update
Using
rank
instead ofhead
:Output:
Alternative
You can use:
我以某种方式担心
pandas.dataframe.rank
方法无法执行此操作。但是pandas.dataframe.groupbyby
可以使用pandas.dataframe.head
与之一起执行此操作。前提
/code>:
产生
的方法,这就是相同
如果您使用天真 =)
I am somehow afraid that the
pandas.DataFrame.rank
method can't do this. Butpandas.DataFrame.groupby
can do this, if you usepandas.DataFrame.head
with it.Assuming you have the following
pandas.DataFrame
:And that you want to keep
max_num_per_example = 2
representatives of each unique values in the columndf['a']
:yields
This is the same as you would get if you to the naive approach:
Which underlines the power of
pandas
=)