生日悖论功能

发布于 2025-02-09 17:52:09 字数 672 浏览 3 评论 0原文

我是R的初学者，正在试图创建生日悖论功能并设法达到这一点，结果约为0.5。

k <- 23 
sims <- 1000 
event <- 0 
for (i in 1:sims) {
  days <- sample(1:365, k, replace = TRUE)
  days.unique <- unique(days) 
  if (length(days.unique) < k) {
    event <- event + 1 } 
  answer <- event/sims}
  answer

但是，当我试图将其放入功能中时，结果始终为0.001。这是代码：

bdayfunction<- function(k){
 sims <- 1000 
 event <- 0 
 for (i in 1:sims) {
  days <- sample(1:365, k, replace = TRUE)
  days.unique <- unique(days) 
  if (length(days.unique) < k) {
    event <- event + 1 } 
 answer <- event/sims
 return (answer)
 }
}

我做错了什么？

原文

I'm a beginner in R and am trying to create a birthday paradox function and managed to reach this point, and the result is approximately 0.5, as expected.

k <- 23 
sims <- 1000 
event <- 0 
for (i in 1:sims) {
  days <- sample(1:365, k, replace = TRUE)
  days.unique <- unique(days) 
  if (length(days.unique) < k) {
    event <- event + 1 } 
  answer <- event/sims}
  answer

However, when I tried to put that into a function, the result was always 0.001. Here is the code:

bdayfunction<- function(k){
 sims <- 1000 
 event <- 0 
 for (i in 1:sims) {
  days <- sample(1:365, k, replace = TRUE)
  days.unique <- unique(days) 
  if (length(days.unique) < k) {
    event <- event + 1 } 
 answer <- event/sims
 return (answer)
 }
}

What have I done wrong?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

离不开的别离 2025-02-16 17:52:09

您的返回不在正确的位置：它在循环中（您的答案通过方式计算相同）。

这起作用：

bdayfunction<- function(k){
  sims <- 1000 
  event <- 0 
  for (i in 1:sims) {
    days <- sample(1:365, k, replace = TRUE)
    days.unique <- unique(days) 
    if (length(days.unique) < k) {
      event <- event + 1 }   
  }
  answer <- event/sims
  return (answer)
}

在R中，您可以使用允许您进行分组操作的库。这两个主要是data.table和dplyr。在这里，您可以尝试创建一个长data.frame，而不是进行循环，然后计算每个模拟的唯一天数，然后计算出低于k的出现数量。使用dplyr：

library(dplyr)

bdayfunction_dplyr <- function(k){  
  df <- data.frame(sim = rep(1:sims,each = k),
                   days = sample(1:365, k*sims, replace = TRUE))
  return(
    df %>%
    group_by(sim) %>%
    summarise(plouf = length(unique(days))< k) %>%
    summarise(out = sum(plouf)/1000) %>%
    pull(out)
    )  
}

在data.table中：

library(data.table)

bdayfunction_data.table <- function(k){
  dt <- data.table(sim = rep(1:sims,each = k),
                   days = sample(1:365, k*sims, replace = TRUE))

  return(dt[,length(unique(days)),sim][V1<k,.N/1000])
}

您可以测试它们提供相同的结果：

set.seed(123)
bdayfunction(23)
[1] 0.515

set.seed(123)
bdayfunction_dplyr(23)
[1] 0.515

set.seed(123)
bdayfunction_data.table(23)
[1] 0.515

现在让我们比较速度：

library(microbenchmark)

microbenchmark(initial = bdayfunction(23),
               dplyr = bdayfunction_dplyr(23),
               data.table = bdayfunction_data.table(23))

Unit: milliseconds
       expr     min       lq      mean  median       uq      max neval cld
    initial  7.3252  7.56900  8.435564  7.7441  8.15995  24.7681   100  a 
      dplyr 12.3488 12.96285 16.846118 13.3777 14.71370 295.6716   100   b
 data.table  5.9186  6.24115  6.540183  6.4494  6.75640   8.1466   100  a

您可以看到data.table比您最初的循环稍快，并且写得更短。

Your return is not in the right place: it is in the loop (the same holds for your answer calculation by the way).

This works:

bdayfunction<- function(k){
  sims <- 1000 
  event <- 0 
  for (i in 1:sims) {
    days <- sample(1:365, k, replace = TRUE)
    days.unique <- unique(days) 
    if (length(days.unique) < k) {
      event <- event + 1 }   
  }
  answer <- event/sims
  return (answer)
}

In R, you can make use of libraries that allows you to do grouping operation. The two main ones are data.table and dplyr. Here, instead of doing a loop, you could try to create a long data.frame with all your simulations, to then calculate the unique number of days per simulation and then count the number of occurrence below k. With dplyr:

library(dplyr)

bdayfunction_dplyr <- function(k){  
  df <- data.frame(sim = rep(1:sims,each = k),
                   days = sample(1:365, k*sims, replace = TRUE))
  return(
    df %>%
    group_by(sim) %>%
    summarise(plouf = length(unique(days))< k) %>%
    summarise(out = sum(plouf)/1000) %>%
    pull(out)
    )  
}

In data.table:

library(data.table)

bdayfunction_data.table <- function(k){
  dt <- data.table(sim = rep(1:sims,each = k),
                   days = sample(1:365, k*sims, replace = TRUE))

  return(dt[,length(unique(days)),sim][V1<k,.N/1000])
}

You can test that they provide the same result:

set.seed(123)
bdayfunction(23)
[1] 0.515

set.seed(123)
bdayfunction_dplyr(23)
[1] 0.515

set.seed(123)
bdayfunction_data.table(23)
[1] 0.515

Now lets compare the speed:

library(microbenchmark)

microbenchmark(initial = bdayfunction(23),
               dplyr = bdayfunction_dplyr(23),
               data.table = bdayfunction_data.table(23))

Unit: milliseconds
       expr     min       lq      mean  median       uq      max neval cld
    initial  7.3252  7.56900  8.435564  7.7441  8.15995  24.7681   100  a 
      dplyr 12.3488 12.96285 16.846118 13.3777 14.71370 295.6716   100   b
 data.table  5.9186  6.24115  6.540183  6.4494  6.75640   8.1466   100  a

You see that data.table is slightly faster than your initial loop, and shorter to write.

回复收藏 0 原文

~没有更多了~