如何在R中创建与向量成比例的随机样本？

发布于 2025-01-11 22:50:37 字数 1063 浏览 2 评论 0原文

假设以下数据框：

df <- data.frame(id = 1:6, value=c(10,20,10,20,30,10))

df

  id value
1  1    10
2  2    20
3  3    10
4  4    20
5  5    30
6  6    10

我想将每个人随机分配到三个组（A、B、C）之一。我想达到 A 组 30%、B 组 50%、C 组 20% 的给定比例。但我想根据值列进行此分配。换句话说，我想要实现类似以下的内容：

  id value group
1  1    10     A
2  2    20     A
3  3    10     C
4  4    20     B
5  5    30     B
6  6    10     C

或者...

  id value group
1  1    10     A
2  2    20     B
3  3    10     A
4  4    20     C
5  5    30     B
6  6    10     A

当然，在这个例子中，这些都是完美的解决方案。但随机分配应尽可能接近给定的比例。因此另一个例子如下：

df <- data.frame(id = 1:6, value=c(112,56,53,13,80,120))

df

  id value
1  1   112
2  2    56
3  3    53
4  4    13
5  5    80
6  6   120

一种可能的分配可能是：

  id value group
1  1   112     B
2  2    56     A
3  3    53     C
4  4    13     C    
5  5    80     A
6  6   120     B

在这种情况下，分配不会是完美的，但接近所需的比例（A 组：31.3%，B 组：53.4%，C 组：15.2%）。

在R中有什么办法可以实现这一点吗？谢谢！

原文

Assume the following data frame:

df <- data.frame(id = 1:6, value=c(10,20,10,20,30,10))

df

  id value
1  1    10
2  2    20
3  3    10
4  4    20
5  5    30
6  6    10

I want to randomly assign every individual to one of three groups (A,B,C). I want to achieve given proportions of 30% to be in group A, 50% to be in group B, 20% to be in group C. But I want to do this assignment based on the value column. In other words, I want to achieve something like the following:

  id value group
1  1    10     A
2  2    20     A
3  3    10     C
4  4    20     B
5  5    30     B
6  6    10     C

or...

  id value group
1  1    10     A
2  2    20     B
3  3    10     A
4  4    20     C
5  5    30     B
6  6    10     A

Of course, in this example, these are perfect solutions. But the random assignment should approach a group assignment as close to the given proportions as possible. So another example would be the following:

df <- data.frame(id = 1:6, value=c(112,56,53,13,80,120))

df

  id value
1  1   112
2  2    56
3  3    53
4  4    13
5  5    80
6  6   120

One possible assignment could be:

  id value group
1  1   112     B
2  2    56     A
3  3    53     C
4  4    13     C    
5  5    80     A
6  6   120     B

In this case, the assignment wouldn't be perfect but close to the desired proportions (group A: 31.3%, group B: 53.4%, group C: 15.2%).

Is there any way to achieve this in R? Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

瑾夏年华 2025-01-18 22:50:37

我理解你的目标是，在小组分配之后，你希望 sum(value[group == "A"]) / sum(value) 大约等于 0.3，同样与 "B" (0.5) 和 "C" (0.2)。如果是这种情况，您所要做的就是分配具有这些概率权重的组，而无需执行任何特殊操作来考虑 value。作为随机化的自然结果，value 的总和（平均而言）将按照您的意愿进行调整。查看：

library(tidyverse)
set.seed(1)

# 100-row example dataframe
df <- tibble(
  id = 1:100,
  value = sample(1:200, 100, replace = TRUE)
)

# simulate 100 sets of group assignments
sims <- map_dfr(
  1:100,                              # iterate 100x
  ~ df %>% 
    mutate(group = sample(
      c("A", "B", "C"), 
      size = 100, 
      replace = TRUE, 
      prob = c(.3, .5, .2))           # probability weights
    ) %>% 
    group_by(group) %>%
    summarize(prop = sum(value)) %>%  # compute `value` proportion
    mutate(prop = prop / sum(prop))   # within each group
)

# central tendency & dispersion across simulations
sims %>% 
  group_by(group) %>% 
  summarize(across(
    prop, 
    list(mean, sd, median, ~ quantile(.x, .25), ~ quantile(.x, .75))
  ))

组	平均值 (SD)	中位数 (IQR)
A	.29 (.06)	.29 (.25 - .32)
B	.51 (.06)	.50 (.47 - .55)
C	.21 (.04)	.21 (.18 - .23)

# distribution of proportions across sims by groups
samps %>% 
  ggplot(aes(prop)) +
  geom_density(aes(fill = group), alpha = .75) +
  scale_fill_brewer(palette = "Dark2")

I understand your goal as, after group assignment, you want sum(value[group == "A"]) / sum(value) to approximately equal 0.3, and likewise with "B" (0.5) and "C" (0.2). If that's the case, all you have to do is assign groups with those probability weights, without doing anything special to take value into account at all. The sums of value will (on average) shake out as you want as a natural consequence of the randomization. Look:

library(tidyverse)
set.seed(1)

# 100-row example dataframe
df <- tibble(
  id = 1:100,
  value = sample(1:200, 100, replace = TRUE)
)

# simulate 100 sets of group assignments
sims <- map_dfr(
  1:100,                              # iterate 100x
  ~ df %>% 
    mutate(group = sample(
      c("A", "B", "C"), 
      size = 100, 
      replace = TRUE, 
      prob = c(.3, .5, .2))           # probability weights
    ) %>% 
    group_by(group) %>%
    summarize(prop = sum(value)) %>%  # compute `value` proportion
    mutate(prop = prop / sum(prop))   # within each group
)

# central tendency & dispersion across simulations
sims %>% 
  group_by(group) %>% 
  summarize(across(
    prop, 
    list(mean, sd, median, ~ quantile(.x, .25), ~ quantile(.x, .75))
  ))

Group	Mean (SD)	Median (IQR)
A	.29 (.06)	.29 (.25 - .32)
B	.51 (.06)	.50 (.47 - .55)
C	.21 (.04)	.21 (.18 - .23)

# distribution of proportions across sims by groups
samps %>% 
  ggplot(aes(prop)) +
  geom_density(aes(fill = group), alpha = .75) +
  scale_fill_brewer(palette = "Dark2")

回复收藏 0 原文

~没有更多了~