是否有一种方法可以将seq（）和grep（）与dplyr共同使用？

发布于 2025-01-22 21:11:32 字数 1046 浏览 0 评论 0原文

抱歉，如果这很明显，我对R的经验不多。我有一个contains_leap_year（date1，date2），我想作为条件传递给dplyr :: if_else （）。

我的循环实现看起来像这样

contains_leap_year <- c()
for (i in 1:nrow(df)) {
    if (df$date1[i] < 0 & !is.na(df$date2[i])) {
        seq_str <- seq(df$date1[i], dat$date2[i], by = "day")
        res <- (length(grep("-02-29", seq_str)) > 0)        
    }
    else {
        res <- FALSE
    }

    contains_leap_year <- append(contains_leap_year, res)
}

，然后我会将此列附加到我的数据框架上，然后做类似的事情

dplyr::mutate(
    res = dplyr::if_else(contains_leap_year == TRUE, action1, action2)
)

，但是这很慢。理想情况下，我想在dplyr的整个过程中工作，

dplyr::mutate(
    res = dplyr::if_else(length(grep("-02-29", seq(date1, date2, by = "day"))) > 0, action1, action2)
)

但是，只需从“ therw thr thr thr thr thr thr thr ”必须长度为1错误，我相信这是因为date1和date2是向量，因此seq无法构造序列。

如果不可能，是否有一种替代方法比循环更快？

原文

Apologies if this is obvious, I don't have much experience with R. I have a function contains_leap_year(date1, date2) that I want to pass in as a condition to dplyr::if_else().

My for loop implementation looks like this

contains_leap_year <- c()
for (i in 1:nrow(df)) {
    if (df$date1[i] < 0 & !is.na(df$date2[i])) {
        seq_str <- seq(df$date1[i], dat$date2[i], by = "day")
        res <- (length(grep("-02-29", seq_str)) > 0)        
    }
    else {
        res <- FALSE
    }

    contains_leap_year <- append(contains_leap_year, res)
}

Then I would append this column to my dataframe, and do something like

dplyr::mutate(
    res = dplyr::if_else(contains_leap_year == TRUE, action1, action2)
)

But this is rather slow. Ideally, I'd like to work within dplyr the whole time like so

dplyr::mutate(
    res = dplyr::if_else(length(grep("-02-29", seq(date1, date2, by = "day"))) > 0, action1, action2)
)

However, just doing this throws 'from' must be of length 1 error, which I believe is because date1 and date2 are vectors, so seq cannot construct the sequence.

If this isn't possible, is there an alternative method that is faster than just a for loop?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

星 2025-01-29 21:11:32

虽然不是理想的，但我（现在）就在矢量上循环解决了（现在），但是使用furrr :: future_map2来做到这一点。我没有任何严格的基准测试，但是它比purr :: map2快2.5倍，并且比循环快10倍。

示例功能

contains_leap_day <- function(x, y) {
    date_seqs <- format(seq(x, y, by = "day"))
    res <- (length(stringr::str_which(date_seqs, "-02-29")) > 0)
    
    return(res)
}

future::plan(multisession)
df %>%
    dplyr::mutate(
        has_leap_day = furrr::future_map2(year1, year2, contains_leap_day, .progress = TRUE)
    )

While not ideal, I've settled (for now) on just looping over the vector, but using furrr::future_map2 to do so. I don't have any rigorous benchmarks, but it's about 2.5x faster than purr::map2 on my dataset, and something around 10x faster than a for loop.

Example function

contains_leap_day <- function(x, y) {
    date_seqs <- format(seq(x, y, by = "day"))
    res <- (length(stringr::str_which(date_seqs, "-02-29")) > 0)
    
    return(res)
}

future::plan(multisession)
df %>%
    dplyr::mutate(
        has_leap_day = furrr::future_map2(year1, year2, contains_leap_day, .progress = TRUE)
    )

回复收藏 0 原文

~没有更多了~