如何使用几乎相同的列值将行放入R中？

发布于 2025-02-13 13:05:39 字数 706 浏览 3 评论 0原文

我有一个带有名称列的数据集。如果存在一个较高的值，我想用较小的“ P”值掉下行。例如，在下面的数据集中，我想放下“ Dexas P5”和“ North Dakota P9”，因此我想删除ROW ID的3和5。最好的方法是什么？提前致谢！

ID	名称	得分
1	明尼苏达州P2	342
2	Vermont P7	342
3	Texas P4	65
4	New Mexico	643
5	North Dakota P8	78
6	North Dakota P9	245
7	Texas P5	856
8	Minnesota LP	342

原文

I have a dataset with a column of names. I would like to drop rows with the lesser "P" value, if there exists one with a higher value. For example, in the dataset below, I would like to drop the row ID's 3 and 5 since there exists a 'Texas P5' and a 'North Dakota P9.' What is the best way to do this? Thanks in advance!

ID	Name	Score
1	Minnesota P2	342
2	Vermont P7	342
3	Texas P4	65
4	New Mexico	643
5	North Dakota P8	78
6	North Dakota P9	245
7	Texas P5	856
8	Minnesota LP	342

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

狼亦尘 2025-02-20 13:05:41

这是一种基本r方式。使用AVE通过name将数据拆分，排除数字，并检查哪个组元素等于其最大元素。 AVE在这种情况下，返回与输入同一类的向量。因此，迫使逻辑并将原始数据框架归为逻辑。

x<-"
ID  Name    Score
1   'Minnesota P2'  342
2   'Vermont P7'    342
3   'Texas P4'  65
4   'New Mexico'    643
5   'North Dakota P8'   78
6   'North Dakota P9'   245
7   'Texas P5'  856
8   'Minnesota LP'  342"
df1 <- read.table(textConnection(x), header = TRUE)


i <- with(df1, ave(Name, sub("\\d+", "", Name), FUN = \(x){
  x == tail(sort(x), 1)
}))
df1[as.logical(i),]
#>   ID            Name Score
#> 1  1    Minnesota P2   342
#> 2  2      Vermont P7   342
#> 4  4      New Mexico   643
#> 6  6 North Dakota P9   245
#> 7  7        Texas P5   856
#> 8  8    Minnesota LP   342

^由

Here is a base R way. Use ave to split the data by Name excluding the numbers, and check which group element is equal to its greatest element. ave returns a vector of the same class as its input, in this case character. So coerce to logical and subset the original data frame.

x<-"
ID  Name    Score
1   'Minnesota P2'  342
2   'Vermont P7'    342
3   'Texas P4'  65
4   'New Mexico'    643
5   'North Dakota P8'   78
6   'North Dakota P9'   245
7   'Texas P5'  856
8   'Minnesota LP'  342"
df1 <- read.table(textConnection(x), header = TRUE)


i <- with(df1, ave(Name, sub("\\d+", "", Name), FUN = \(x){
  x == tail(sort(x), 1)
}))
df1[as.logical(i),]
#>   ID            Name Score
#> 1  1    Minnesota P2   342
#> 2  2      Vermont P7   342
#> 4  4      New Mexico   643
#> 6  6 North Dakota P9   245
#> 7  7        Texas P5   856
#> 8  8    Minnesota LP   342

^{Created on 2022-07-06 by the reprex package (v2.0.1)}

回复收藏 0 原文

痞味浪人 2025-02-20 13:05:41

另一个基本选项：

dat <- cbind(dat, strcapture("(.*[^ ]) *P([0-9]+)$", dat$Name, list(Name2 = "", P = 0L)))
isna <- is.na(dat$P)
dat$P[isna] <- 0L; dat$Name2[isna] <- dat$Name[isna]
dat
#   ID            Name Score        Name2 P
# 1  1    Minnesota P2   342    Minnesota 2
# 2  2      Vermont P7   342      Vermont 7
# 3  3        Texas P4    65        Texas 4
# 4  4      New Mexico   643   New Mexico 0
# 5  5 North Dakota P8    78 North Dakota 8
# 6  6 North Dakota P9   245 North Dakota 9
# 7  7        Texas P5   856        Texas 5
# 8  8    Minnesota LP   342 Minnesota LP 0

dat[ave(dat$P, dat$Name2, FUN = function(z) seq_along(z) == which.max(z)) > 0,]
#   ID            Name Score        Name2 P
# 1  1    Minnesota P2   342    Minnesota 2
# 2  2      Vermont P7   342      Vermont 7
# 4  4      New Mexico   643   New Mexico 0
# 6  6 North Dakota P9   245 North Dakota 9
# 7  7        Texas P5   856        Texas 5
# 8  8    Minnesota LP   342 Minnesota LP 0

Another base option:

dat <- cbind(dat, strcapture("(.*[^ ]) *P([0-9]+)quot;, dat$Name, list(Name2 = "", P = 0L)))
isna <- is.na(dat$P)
dat$P[isna] <- 0L; dat$Name2[isna] <- dat$Name[isna]
dat
#   ID            Name Score        Name2 P
# 1  1    Minnesota P2   342    Minnesota 2
# 2  2      Vermont P7   342      Vermont 7
# 3  3        Texas P4    65        Texas 4
# 4  4      New Mexico   643   New Mexico 0
# 5  5 North Dakota P8    78 North Dakota 8
# 6  6 North Dakota P9   245 North Dakota 9
# 7  7        Texas P5   856        Texas 5
# 8  8    Minnesota LP   342 Minnesota LP 0

dat[ave(dat$P, dat$Name2, FUN = function(z) seq_along(z) == which.max(z)) > 0,]
#   ID            Name Score        Name2 P
# 1  1    Minnesota P2   342    Minnesota 2
# 2  2      Vermont P7   342      Vermont 7
# 4  4      New Mexico   643   New Mexico 0
# 6  6 North Dakota P9   245 North Dakota 9
# 7  7        Texas P5   856        Texas 5
# 8  8    Minnesota LP   342 Minnesota LP 0

回复收藏 0 原文

怎言笑 2025-02-20 13:05:41

尝试此解决方案，以name name值差异始终为2个字符的部分，前提是长2个字符，然后位于字符串的末尾：

library(dplyr)
df %>%
  # create a dummy var without the variable part in `Name`:
  mutate(Names_dum = sub("\\s.{2}$","",Name)) %>%
  # for each `Names_dum`...:
  group_by(Names_dum) %>%
  # now filter for the maximum value:
  filter(Score == max(Score)) %>%
  ungroup() %>%
  # remove the dummy:
  select(-Names_dum)
# A tibble: 6 × 2
  Name            Score
  <chr>           <dbl>
1 Minnesota P2      342
2 Vermont P7        342
3 New Mexico        643
4 North Dakota P9   245
5 Texas P5          856
6 Minnesota LP      342

数据：data：data ：

df <- data.frame(Name = c("Minnesota P2","Vermont P7", "Texas P4", "New Mexico", "North Dakota P8", "North Dakota P9", "Texas P5", "Minnesota LP"), 
                 Score = c(342,342,65,643,78,245,856,342))

Try this solution, which presupposes that the Name parts by which the Namevalues differ are always 2 characters long, preceded by whitespace, and positioned at the end of the string:

library(dplyr)
df %>%
  # create a dummy var without the variable part in `Name`:
  mutate(Names_dum = sub("\\s.{2}quot;,"",Name)) %>%
  # for each `Names_dum`...:
  group_by(Names_dum) %>%
  # now filter for the maximum value:
  filter(Score == max(Score)) %>%
  ungroup() %>%
  # remove the dummy:
  select(-Names_dum)
# A tibble: 6 × 2
  Name            Score
  <chr>           <dbl>
1 Minnesota P2      342
2 Vermont P7        342
3 New Mexico        643
4 North Dakota P9   245
5 Texas P5          856
6 Minnesota LP      342

Data:

df <- data.frame(Name = c("Minnesota P2","Vermont P7", "Texas P4", "New Mexico", "North Dakota P8", "North Dakota P9", "Texas P5", "Minnesota LP"), 
                 Score = c(342,342,65,643,78,245,856,342))

回复收藏 0 原文

心头的小情儿 2025-02-20 13:05:41

使用data.table方法：

library(data.table)

dt[, Score1 := max(Score), .(gsub(" [A-Z0-9]+$", "", Name))][
  Score >= Score1, .(Name, Score)]

#>               Name Score
#> 1:    Minnesota P2   342
#> 2:      Vermont P7   342
#> 3:      New Mexico   643
#> 4: North Dakota P9   245
#> 5:        Texas P5   856
#> 6:    Minnesota LP   342

Using a data.table approach:

library(data.table)

dt[, Score1 := max(Score), .(gsub(" [A-Z0-9]+quot;, "", Name))][
  Score >= Score1, .(Name, Score)]

#>               Name Score
#> 1:    Minnesota P2   342
#> 2:      Vermont P7   342
#> 3:      New Mexico   643
#> 4: North Dakota P9   245
#> 5:        Texas P5   856
#> 6:    Minnesota LP   342

回复收藏 0 原文

~没有更多了~