是否有一种方法可以将seq()和grep()与dplyr共同使用?

发布于 2025-01-22 21:11:32 字数 1046 浏览 0 评论 0原文

抱歉,如果这很明显,我对R的经验不多。我有一个contains_leap_year(date1,date2),我想作为条件传递给dplyr :: if_else ()

我的循环实现看起来像这样

contains_leap_year <- c()
for (i in 1:nrow(df)) {
    if (df$date1[i] < 0 & !is.na(df$date2[i])) {
        seq_str <- seq(df$date1[i], dat$date2[i], by = "day")
        res <- (length(grep("-02-29", seq_str)) > 0)        
    }
    else {
        res <- FALSE
    }

    contains_leap_year <- append(contains_leap_year, res)
}

,然后我会将此列附加到我的数据框架上,然后做类似的事情

dplyr::mutate(
    res = dplyr::if_else(contains_leap_year == TRUE, action1, action2)
)

,但是这很慢。理想情况下,我想在dplyr的整个过程中工作,

dplyr::mutate(
    res = dplyr::if_else(length(grep("-02-29", seq(date1, date2, by = "day"))) > 0, action1, action2)
)

但是,只需从“ therw thr thr thr thr thr thr thr ”必须长度为1错误,我相信这是因为date1date2是向量,因此seq无法构造序列。

如果不可能,是否有一种替代方法比循环更快?

Apologies if this is obvious, I don't have much experience with R. I have a function contains_leap_year(date1, date2) that I want to pass in as a condition to dplyr::if_else().

My for loop implementation looks like this

contains_leap_year <- c()
for (i in 1:nrow(df)) {
    if (df$date1[i] < 0 & !is.na(df$date2[i])) {
        seq_str <- seq(df$date1[i], dat$date2[i], by = "day")
        res <- (length(grep("-02-29", seq_str)) > 0)        
    }
    else {
        res <- FALSE
    }

    contains_leap_year <- append(contains_leap_year, res)
}

Then I would append this column to my dataframe, and do something like

dplyr::mutate(
    res = dplyr::if_else(contains_leap_year == TRUE, action1, action2)
)

But this is rather slow. Ideally, I'd like to work within dplyr the whole time like so

dplyr::mutate(
    res = dplyr::if_else(length(grep("-02-29", seq(date1, date2, by = "day"))) > 0, action1, action2)
)

However, just doing this throws 'from' must be of length 1 error, which I believe is because date1 and date2 are vectors, so seq cannot construct the sequence.

If this isn't possible, is there an alternative method that is faster than just a for loop?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

2025-01-29 21:11:32

虽然不是理想的,但我(现在)就在矢量上循环解决了(现在),但是使用furrr :: future_map2来做到这一点。我没有任何严格的基准测试,但是它比purr :: map2快2.5倍,并且比循环快10倍。

示例功能

contains_leap_day <- function(x, y) {
    date_seqs <- format(seq(x, y, by = "day"))
    res <- (length(stringr::str_which(date_seqs, "-02-29")) > 0)
    
    return(res)
}

future::plan(multisession)
df %>%
    dplyr::mutate(
        has_leap_day = furrr::future_map2(year1, year2, contains_leap_day, .progress = TRUE)
    )

While not ideal, I've settled (for now) on just looping over the vector, but using furrr::future_map2 to do so. I don't have any rigorous benchmarks, but it's about 2.5x faster than purr::map2 on my dataset, and something around 10x faster than a for loop.

Example function

contains_leap_day <- function(x, y) {
    date_seqs <- format(seq(x, y, by = "day"))
    res <- (length(stringr::str_which(date_seqs, "-02-29")) > 0)
    
    return(res)
}

future::plan(multisession)
df %>%
    dplyr::mutate(
        has_leap_day = furrr::future_map2(year1, year2, contains_leap_day, .progress = TRUE)
    )
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文