如何匹配值对,然后减去列的值?
我目前正在使用一个数据集,其中包含2010年至2019年之间所有大满贯网球比赛。数据框架包含每场比赛两行,一行包含有关一个玩家(获胜者)的信息,另一个包含有关另一个播放器的信息(失败者(The Loser) )。这些对之间的共同点是match_id
变量。
我想创建一个名为“等级差”的新变量。这个想法是让每一行赢家和失败者之间的ATP等级差异。
这是我与之合作的数据框架的一个子集:
# A tibble: 9,290 x 5
# Groups: player_id [444]
match_id player_id rank winner full_name
<chr> <chr> <dbl> <fct> <chr>
1 m_2019_A_0 atp_104731 6 True Kevin Anderson
2 m_2019_A_1 atp_105932 20 True Nikoloz Basilashvili
3 m_2019_A_2 atp_105430 98 True Radu Albot
4 m_2019_A_3 atp_105882 137 True Stefano Travaglia
5 m_2019_A_4 atp_104269 28 True Fernando Verdasco
6 m_2019_A_5 atp_104655 94 True Pablo Cuevas
7 m_2019_A_7 atp_126774 15 True Stefanos Tsitsipas
8 m_2019_A_8 atp_105777 21 True Grigor Dimitrov
9 m_2019_A_9 atp_126207 39 True Frances Tiafoe
10 m_2019_A_10 atp_104745 2 True Rafael Nadal
# ... with 9,280 more rows
这是我尝试过但没有起作用的方法:
final_match_with_player %>%
group_by(match_id) %>%
mutate(diff_rank = rank[winner == 'True'] - rank[winner == 'False'])
您对如何做到这一点有任何了解吗?
非常感谢您!
I am currently working with a dataset containing all grand slam tennis matches between 2010 and 2019. The data frame contains two rows per match, one row with containing info about one player (the winner) and another containing info about the other player (the loser). The commonality between each of these pairs is the match_ID
variable.
I would like to create a new variable called rank difference. The idea would be to have for each row the difference in ATP rank between the winner and the loser.
Here is what a subset of the data frame I am working with looks like:
# A tibble: 9,290 x 5
# Groups: player_id [444]
match_id player_id rank winner full_name
<chr> <chr> <dbl> <fct> <chr>
1 m_2019_A_0 atp_104731 6 True Kevin Anderson
2 m_2019_A_1 atp_105932 20 True Nikoloz Basilashvili
3 m_2019_A_2 atp_105430 98 True Radu Albot
4 m_2019_A_3 atp_105882 137 True Stefano Travaglia
5 m_2019_A_4 atp_104269 28 True Fernando Verdasco
6 m_2019_A_5 atp_104655 94 True Pablo Cuevas
7 m_2019_A_7 atp_126774 15 True Stefanos Tsitsipas
8 m_2019_A_8 atp_105777 21 True Grigor Dimitrov
9 m_2019_A_9 atp_126207 39 True Frances Tiafoe
10 m_2019_A_10 atp_104745 2 True Rafael Nadal
# ... with 9,280 more rows
Here is what I tried but did not work:
final_match_with_player %>%
group_by(match_id) %>%
mutate(diff_rank = rank[winner == 'True'] - rank[winner == 'False'])
Do you have any idea of how I could do that ?
Thank you very much in advance !
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这会得到您想要的吗?
导致:
在进一步的信息上编辑,
然后使用case_时的铅和滞后函数进行突变可能会更容易
仅通过match_id和等级排列,
Does this get what you want?
Resulting in:
Edit upon further information
It might be easier just to arrange by match_id and rank, then mutate using the lead and lag functions in a case_when conditional:
Giving: