如何使用几乎相同的列值将行放入R中?

发布于 2025-02-13 13:05:39 字数 706 浏览 3 评论 0原文

我有一个带有名称列的数据集。如果存在一个较高的值,我想用较小的“ P”值掉下行。例如,在下面的数据集中,我想放下“ Dexas P5”和“ North Dakota P9”,因此我想删除ROW ID的3和5。最好的方法是什么?提前致谢!

ID名称得分
1明尼苏达州P2342
2Vermont P7342
3Texas P465
4New Mexico643
5North Dakota P878
6North Dakota P9245
7Texas P5856
8Minnesota LP342

I have a dataset with a column of names. I would like to drop rows with the lesser "P" value, if there exists one with a higher value. For example, in the dataset below, I would like to drop the row ID's 3 and 5 since there exists a 'Texas P5' and a 'North Dakota P9.' What is the best way to do this? Thanks in advance!

IDNameScore
1Minnesota P2342
2Vermont P7342
3Texas P465
4New Mexico643
5North Dakota P878
6North Dakota P9245
7Texas P5856
8Minnesota LP342

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

狼亦尘 2025-02-20 13:05:41

这是一种基本r方式。使用AVE通过name将数据拆分,排除数字,并检查哪个组元素等于其最大元素。 AVE在这种情况下,返回与输入同一类的向量。因此,迫使逻辑并将原始数据框架归为逻辑。

x<-"
ID  Name    Score
1   'Minnesota P2'  342
2   'Vermont P7'    342
3   'Texas P4'  65
4   'New Mexico'    643
5   'North Dakota P8'   78
6   'North Dakota P9'   245
7   'Texas P5'  856
8   'Minnesota LP'  342"
df1 <- read.table(textConnection(x), header = TRUE)


i <- with(df1, ave(Name, sub("\\d+", "", Name), FUN = \(x){
  x == tail(sort(x), 1)
}))
df1[as.logical(i),]
#>   ID            Name Score
#> 1  1    Minnesota P2   342
#> 2  2      Vermont P7   342
#> 4  4      New Mexico   643
#> 6  6 North Dakota P9   245
#> 7  7        Texas P5   856
#> 8  8    Minnesota LP   342

Here is a base R way. Use ave to split the data by Name excluding the numbers, and check which group element is equal to its greatest element. ave returns a vector of the same class as its input, in this case character. So coerce to logical and subset the original data frame.

x<-"
ID  Name    Score
1   'Minnesota P2'  342
2   'Vermont P7'    342
3   'Texas P4'  65
4   'New Mexico'    643
5   'North Dakota P8'   78
6   'North Dakota P9'   245
7   'Texas P5'  856
8   'Minnesota LP'  342"
df1 <- read.table(textConnection(x), header = TRUE)


i <- with(df1, ave(Name, sub("\\d+", "", Name), FUN = \(x){
  x == tail(sort(x), 1)
}))
df1[as.logical(i),]
#>   ID            Name Score
#> 1  1    Minnesota P2   342
#> 2  2      Vermont P7   342
#> 4  4      New Mexico   643
#> 6  6 North Dakota P9   245
#> 7  7        Texas P5   856
#> 8  8    Minnesota LP   342

Created on 2022-07-06 by the reprex package (v2.0.1)

痞味浪人 2025-02-20 13:05:41

另一个基本选项:

dat <- cbind(dat, strcapture("(.*[^ ]) *P([0-9]+)$", dat$Name, list(Name2 = "", P = 0L)))
isna <- is.na(dat$P)
dat$P[isna] <- 0L; dat$Name2[isna] <- dat$Name[isna]
dat
#   ID            Name Score        Name2 P
# 1  1    Minnesota P2   342    Minnesota 2
# 2  2      Vermont P7   342      Vermont 7
# 3  3        Texas P4    65        Texas 4
# 4  4      New Mexico   643   New Mexico 0
# 5  5 North Dakota P8    78 North Dakota 8
# 6  6 North Dakota P9   245 North Dakota 9
# 7  7        Texas P5   856        Texas 5
# 8  8    Minnesota LP   342 Minnesota LP 0

dat[ave(dat$P, dat$Name2, FUN = function(z) seq_along(z) == which.max(z)) > 0,]
#   ID            Name Score        Name2 P
# 1  1    Minnesota P2   342    Minnesota 2
# 2  2      Vermont P7   342      Vermont 7
# 4  4      New Mexico   643   New Mexico 0
# 6  6 North Dakota P9   245 North Dakota 9
# 7  7        Texas P5   856        Texas 5
# 8  8    Minnesota LP   342 Minnesota LP 0

Another base option:

dat <- cbind(dat, strcapture("(.*[^ ]) *P([0-9]+)
quot;, dat$Name, list(Name2 = "", P = 0L)))
isna <- is.na(dat$P)
dat$P[isna] <- 0L; dat$Name2[isna] <- dat$Name[isna]
dat
#   ID            Name Score        Name2 P
# 1  1    Minnesota P2   342    Minnesota 2
# 2  2      Vermont P7   342      Vermont 7
# 3  3        Texas P4    65        Texas 4
# 4  4      New Mexico   643   New Mexico 0
# 5  5 North Dakota P8    78 North Dakota 8
# 6  6 North Dakota P9   245 North Dakota 9
# 7  7        Texas P5   856        Texas 5
# 8  8    Minnesota LP   342 Minnesota LP 0

dat[ave(dat$P, dat$Name2, FUN = function(z) seq_along(z) == which.max(z)) > 0,]
#   ID            Name Score        Name2 P
# 1  1    Minnesota P2   342    Minnesota 2
# 2  2      Vermont P7   342      Vermont 7
# 4  4      New Mexico   643   New Mexico 0
# 6  6 North Dakota P9   245 North Dakota 9
# 7  7        Texas P5   856        Texas 5
# 8  8    Minnesota LP   342 Minnesota LP 0
怎言笑 2025-02-20 13:05:41

尝试此解决方案,以name name值差异始终为2个字符的部分,前提是长2个字符,然后位于字符串的末尾:

library(dplyr)
df %>%
  # create a dummy var without the variable part in `Name`:
  mutate(Names_dum = sub("\\s.{2}$","",Name)) %>%
  # for each `Names_dum`...:
  group_by(Names_dum) %>%
  # now filter for the maximum value:
  filter(Score == max(Score)) %>%
  ungroup() %>%
  # remove the dummy:
  select(-Names_dum)
# A tibble: 6 × 2
  Name            Score
  <chr>           <dbl>
1 Minnesota P2      342
2 Vermont P7        342
3 New Mexico        643
4 North Dakota P9   245
5 Texas P5          856
6 Minnesota LP      342

数据 :data:data :

df <- data.frame(Name = c("Minnesota P2","Vermont P7", "Texas P4", "New Mexico", "North Dakota P8", "North Dakota P9", "Texas P5", "Minnesota LP"), 
                 Score = c(342,342,65,643,78,245,856,342))

Try this solution, which presupposes that the Name parts by which the Namevalues differ are always 2 characters long, preceded by whitespace, and positioned at the end of the string:

library(dplyr)
df %>%
  # create a dummy var without the variable part in `Name`:
  mutate(Names_dum = sub("\\s.{2}
quot;,"",Name)) %>%
  # for each `Names_dum`...:
  group_by(Names_dum) %>%
  # now filter for the maximum value:
  filter(Score == max(Score)) %>%
  ungroup() %>%
  # remove the dummy:
  select(-Names_dum)
# A tibble: 6 × 2
  Name            Score
  <chr>           <dbl>
1 Minnesota P2      342
2 Vermont P7        342
3 New Mexico        643
4 North Dakota P9   245
5 Texas P5          856
6 Minnesota LP      342

Data:

df <- data.frame(Name = c("Minnesota P2","Vermont P7", "Texas P4", "New Mexico", "North Dakota P8", "North Dakota P9", "Texas P5", "Minnesota LP"), 
                 Score = c(342,342,65,643,78,245,856,342))
心头的小情儿 2025-02-20 13:05:41

使用data.table方法:

library(data.table)

dt[, Score1 := max(Score), .(gsub(" [A-Z0-9]+$", "", Name))][
  Score >= Score1, .(Name, Score)]

#>               Name Score
#> 1:    Minnesota P2   342
#> 2:      Vermont P7   342
#> 3:      New Mexico   643
#> 4: North Dakota P9   245
#> 5:        Texas P5   856
#> 6:    Minnesota LP   342

Using a data.table approach:

library(data.table)

dt[, Score1 := max(Score), .(gsub(" [A-Z0-9]+
quot;, "", Name))][
  Score >= Score1, .(Name, Score)]

#>               Name Score
#> 1:    Minnesota P2   342
#> 2:      Vermont P7   342
#> 3:      New Mexico   643
#> 4: North Dakota P9   245
#> 5:        Texas P5   856
#> 6:    Minnesota LP   342
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文