在r group_by和dplyr中的循环中

发布于 2025-02-08 10:06:27 字数 1391 浏览 2 评论 0 原文

我有以下数据框架，

my_df <- data.frame(Municipality=c('a', 'a', 'a', 'a', 'b', 'b', 'c','c','c','d','d'),
                    state=c('ac', 'ac', 'ac', 'ac', 'pb', 'pb', 'am','am','am','pi','pi'),
                    votes=c(541, 463, 246, 49, 2443, 2287, 1035,3530,9999,666,3809))

我想计算每个“市政当局”的投票份额以及每一个与州最高投票股有关的差异（“保证金胜利”）。我尝试了以下代码，

actual_df<-my_df %>%
  group_by(Municipality,state) %>% 
  mutate(
    share_vote = votes / sum(votes), # calculate vote shares
    margin_victory = (max(share_vote)-(max( share_vote[share_vote!=max(share_vote)]))),
  ) %>% 
  ungroup()

此代码是按预期正确计算的共享投票。但是，只有当您有两个市政当局时，“保证金胜利”才是正确的。以下是我想尝试

desired_df <- data.frame(Municipality=c('a', 'a', 'a', 'a', 'b', 'b', 'c','c','c','d','d'),
                    state=c('ac', 'ac', 'ac', 'ac', 'pb', 'pb', 'am','am','am','pi','pi'),
                    votes=c(541, 463, 246, 49, 2443, 2287, 1035,3530,9999,666,3809),
                    margin_victory= c(0.06004619,-0.06004619,0.2270978, 0.3787529,
                                      0.03298097,-0.03298097,
                                      -0.6154902,-0.44417742,0.44417742,
                                      -0.70234637,0.70234637))

用 margin_victory =（i in share_vote）{max（share_vote）-share_vote}的“实际df”代码中的“保证金胜利”，，但是没有成功。

原文

I have the following dataframe

my_df <- data.frame(Municipality=c('a', 'a', 'a', 'a', 'b', 'b', 'c','c','c','d','d'),
                    state=c('ac', 'ac', 'ac', 'ac', 'pb', 'pb', 'am','am','am','pi','pi'),
                    votes=c(541, 463, 246, 49, 2443, 2287, 1035,3530,9999,666,3809))

I would like to calculate the vote shares of each "Municipality" and the difference ("margin victory") of each one of them in relation to the highest vote shares by state. I tried the following code

actual_df<-my_df %>%
  group_by(Municipality,state) %>% 
  mutate(
    share_vote = votes / sum(votes), # calculate vote shares
    margin_victory = (max(share_vote)-(max( share_vote[share_vote!=max(share_vote)]))),
  ) %>% 
  ungroup()

This code is calculating share vote correctly as expected. However, the "margin victory" is correct only when you have two Municipalities. The below is what I would like to have

desired_df <- data.frame(Municipality=c('a', 'a', 'a', 'a', 'b', 'b', 'c','c','c','d','d'),
                    state=c('ac', 'ac', 'ac', 'ac', 'pb', 'pb', 'am','am','am','pi','pi'),
                    votes=c(541, 463, 246, 49, 2443, 2287, 1035,3530,9999,666,3809),
                    margin_victory= c(0.06004619,-0.06004619,0.2270978, 0.3787529,
                                      0.03298097,-0.03298097,
                                      -0.6154902,-0.44417742,0.44417742,
                                      -0.70234637,0.70234637))

I tried to replace "margin victory" in the "actual df" code with margin_victory = for (i in share_vote ) {max(share_vote)-share_vote}, but without sucess.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

匿名的好友 2025-02-15 10:06:27

您确定所需结果的迹象吗？如果没有，我会建议以下内容：

library(tidyverse)

my_df %>% group_by(Municipality, state) %>%
  mutate(
    share_vote = votes / sum(votes),
    mar = ifelse(votes == max(votes),
                 votes - max(votes[votes != max(votes)]),
                 (votes - max(votes))) / sum(votes)) %>%
  ungroup()
#> # A tibble: 11 × 5
#>    Municipality state votes share_vote     mar
#>    <chr>        <chr> <dbl>      <dbl>   <dbl>
#>  1 a            ac      541     0.416   0.0600
#>  2 a            ac      463     0.356  -0.0600
#>  3 a            ac      246     0.189  -0.227 
#>  4 a            ac       49     0.0377 -0.379 
#>  5 b            pb     2443     0.516   0.0330
#>  6 b            pb     2287     0.484  -0.0330
#>  7 c            am     1035     0.0711 -0.615 
#>  8 c            am     3530     0.242  -0.444 
#>  9 c            am     9999     0.687   0.444 
#> 10 d            pi      666     0.149  -0.702 
#> 11 d            pi     3809     0.851   0.702

^由

Are you sure about the signs of your desired result? If not, I would have suggested the following:

library(tidyverse)

my_df %>% group_by(Municipality, state) %>%
  mutate(
    share_vote = votes / sum(votes),
    mar = ifelse(votes == max(votes),
                 votes - max(votes[votes != max(votes)]),
                 (votes - max(votes))) / sum(votes)) %>%
  ungroup()
#> # A tibble: 11 × 5
#>    Municipality state votes share_vote     mar
#>    <chr>        <chr> <dbl>      <dbl>   <dbl>
#>  1 a            ac      541     0.416   0.0600
#>  2 a            ac      463     0.356  -0.0600
#>  3 a            ac      246     0.189  -0.227 
#>  4 a            ac       49     0.0377 -0.379 
#>  5 b            pb     2443     0.516   0.0330
#>  6 b            pb     2287     0.484  -0.0330
#>  7 c            am     1035     0.0711 -0.615 
#>  8 c            am     3530     0.242  -0.444 
#>  9 c            am     9999     0.687   0.444 
#> 10 d            pi      666     0.149  -0.702 
#> 11 d            pi     3809     0.851   0.702

^{Created on 2022-06-17 by the reprex package (v2.0.1)}

回复收藏 0 原文

~没有更多了~