生日悖论功能

发布于 2025-02-09 17:52:09 字数 672 浏览 3 评论 0原文

我是R的初学者,正在试图创建生日悖论功能并设法达到这一点,结果约为0.5。

k <- 23 
sims <- 1000 
event <- 0 
for (i in 1:sims) {
  days <- sample(1:365, k, replace = TRUE)
  days.unique <- unique(days) 
  if (length(days.unique) < k) {
    event <- event + 1 } 
  answer <- event/sims}
  answer

但是,当我试图将其放入功能中时,结果始终为0.001。这是代码:

bdayfunction<- function(k){
 sims <- 1000 
 event <- 0 
 for (i in 1:sims) {
  days <- sample(1:365, k, replace = TRUE)
  days.unique <- unique(days) 
  if (length(days.unique) < k) {
    event <- event + 1 } 
 answer <- event/sims
 return (answer)
 }
}

我做错了什么?

I'm a beginner in R and am trying to create a birthday paradox function and managed to reach this point, and the result is approximately 0.5, as expected.

k <- 23 
sims <- 1000 
event <- 0 
for (i in 1:sims) {
  days <- sample(1:365, k, replace = TRUE)
  days.unique <- unique(days) 
  if (length(days.unique) < k) {
    event <- event + 1 } 
  answer <- event/sims}
  answer

However, when I tried to put that into a function, the result was always 0.001. Here is the code:

bdayfunction<- function(k){
 sims <- 1000 
 event <- 0 
 for (i in 1:sims) {
  days <- sample(1:365, k, replace = TRUE)
  days.unique <- unique(days) 
  if (length(days.unique) < k) {
    event <- event + 1 } 
 answer <- event/sims
 return (answer)
 }
}

What have I done wrong?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

离不开的别离 2025-02-16 17:52:09

您的返回不在正确的位置:它在循环中(您的答案通过方式计算相同)。

这起作用:

bdayfunction<- function(k){
  sims <- 1000 
  event <- 0 
  for (i in 1:sims) {
    days <- sample(1:365, k, replace = TRUE)
    days.unique <- unique(days) 
    if (length(days.unique) < k) {
      event <- event + 1 }   
  }
  answer <- event/sims
  return (answer)
}

在R中,您可以使用允许您进行分组操作的库。这两个主要是data.tabledplyr。在这里,您可以尝试创建一个长data.frame,而不是进行循环,然后计算每个模拟的唯一天数,然后计算出低于k的出现数量。使用dplyr

library(dplyr)

bdayfunction_dplyr <- function(k){  
  df <- data.frame(sim = rep(1:sims,each = k),
                   days = sample(1:365, k*sims, replace = TRUE))
  return(
    df %>%
    group_by(sim) %>%
    summarise(plouf = length(unique(days))< k) %>%
    summarise(out = sum(plouf)/1000) %>%
    pull(out)
    )  
}

data.table中:

library(data.table)

bdayfunction_data.table <- function(k){
  dt <- data.table(sim = rep(1:sims,each = k),
                   days = sample(1:365, k*sims, replace = TRUE))

  return(dt[,length(unique(days)),sim][V1<k,.N/1000])
}

您可以测试它们提供相同的结果:

set.seed(123)
bdayfunction(23)
[1] 0.515

set.seed(123)
bdayfunction_dplyr(23)
[1] 0.515

set.seed(123)
bdayfunction_data.table(23)
[1] 0.515

现在让我们比较速度:

library(microbenchmark)

microbenchmark(initial = bdayfunction(23),
               dplyr = bdayfunction_dplyr(23),
               data.table = bdayfunction_data.table(23))

Unit: milliseconds
       expr     min       lq      mean  median       uq      max neval cld
    initial  7.3252  7.56900  8.435564  7.7441  8.15995  24.7681   100  a 
      dplyr 12.3488 12.96285 16.846118 13.3777 14.71370 295.6716   100   b
 data.table  5.9186  6.24115  6.540183  6.4494  6.75640   8.1466   100  a 

您可以看到data.table比您最初的循环稍快,并且写得更短。

Your return is not in the right place: it is in the loop (the same holds for your answer calculation by the way).

This works:

bdayfunction<- function(k){
  sims <- 1000 
  event <- 0 
  for (i in 1:sims) {
    days <- sample(1:365, k, replace = TRUE)
    days.unique <- unique(days) 
    if (length(days.unique) < k) {
      event <- event + 1 }   
  }
  answer <- event/sims
  return (answer)
}

In R, you can make use of libraries that allows you to do grouping operation. The two main ones are data.table and dplyr. Here, instead of doing a loop, you could try to create a long data.frame with all your simulations, to then calculate the unique number of days per simulation and then count the number of occurrence below k. With dplyr:

library(dplyr)

bdayfunction_dplyr <- function(k){  
  df <- data.frame(sim = rep(1:sims,each = k),
                   days = sample(1:365, k*sims, replace = TRUE))
  return(
    df %>%
    group_by(sim) %>%
    summarise(plouf = length(unique(days))< k) %>%
    summarise(out = sum(plouf)/1000) %>%
    pull(out)
    )  
}

In data.table:

library(data.table)

bdayfunction_data.table <- function(k){
  dt <- data.table(sim = rep(1:sims,each = k),
                   days = sample(1:365, k*sims, replace = TRUE))

  return(dt[,length(unique(days)),sim][V1<k,.N/1000])
}

You can test that they provide the same result:

set.seed(123)
bdayfunction(23)
[1] 0.515

set.seed(123)
bdayfunction_dplyr(23)
[1] 0.515

set.seed(123)
bdayfunction_data.table(23)
[1] 0.515

Now lets compare the speed:

library(microbenchmark)

microbenchmark(initial = bdayfunction(23),
               dplyr = bdayfunction_dplyr(23),
               data.table = bdayfunction_data.table(23))

Unit: milliseconds
       expr     min       lq      mean  median       uq      max neval cld
    initial  7.3252  7.56900  8.435564  7.7441  8.15995  24.7681   100  a 
      dplyr 12.3488 12.96285 16.846118 13.3777 14.71370 295.6716   100   b
 data.table  5.9186  6.24115  6.540183  6.4494  6.75640   8.1466   100  a 

You see that data.table is slightly faster than your initial loop, and shorter to write.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文