如何轻松生成/模拟不同组的示例数据以进行建模
如何轻松生成/模拟有意义的建模示例数据:例如,告诉我给我 n 行数据,对于 2 个组,他们的性别分布和平均年龄应分别相差 X 和 Y 单位?有没有一种简单的方法可以自动完成?有包吗?
例如,生成此类数据的最简单方法是什么?
- 组: 两组:A、B
- 性别: 不同性别分布:A 30%,B 70%
- 年龄: 不同平均年龄:A 50 ,B 70
PS! Tidyverse 解决方案特别受欢迎。
到目前为止,我最好的尝试仍然是相当多的代码:
n=100
d = bind_rows(
#group A females
tibble(group = rep("A"),
sex = rep("Female"),
age = rnorm(n*0.4, 50, 4)),
#group B females
tibble(group = rep("B"),
sex = rep("Female"),
age = rnorm(n*0.3, 45, 4)),
#group A males
tibble(group = rep("A"),
sex = rep("Male"),
age = rnorm(n*0.20, 60, 6)),
#group B males
tibble(group = rep("B"),
sex = rep("Male"),
age = rnorm(n*0.10, 55, 4)))
< img src="https://i.sstatic.net/NA4gR.png" alt="在此处输入图像描述">
d %>% group_by(group, sex) %>%
summarise(n = n(),
mean_age = mean(age))
How to easily generate/simulate meaningful example data for modelling: e.g. telling that give me n rows of data, for 2 groups, their sex distributions and mean age should differ by X and Y units, respectively? Is there a simple way for doing it automatically? Any packages?
For example, what would be the simplest way for generating such data?
- groups: two groups: A, B
- sex: different sex distributions: A 30%, B 70%
- age: different mean ages: A 50, B 70
PS! Tidyverse solutions are especially welcome.
My best try so far is still quite a lot of code:
n=100
d = bind_rows(
#group A females
tibble(group = rep("A"),
sex = rep("Female"),
age = rnorm(n*0.4, 50, 4)),
#group B females
tibble(group = rep("B"),
sex = rep("Female"),
age = rnorm(n*0.3, 45, 4)),
#group A males
tibble(group = rep("A"),
sex = rep("Male"),
age = rnorm(n*0.20, 60, 6)),
#group B males
tibble(group = rep("B"),
sex = rep("Male"),
age = rnorm(n*0.10, 55, 4)))
d %>% group_by(group, sex) %>%
summarise(n = n(),
mean_age = mean(age))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在 R 中,有很多方法可以从向量中进行采样并从随机分布中进行绘制。例如,您请求的数据集可以这样创建:
我们可以使用 tidyverse 来显示它执行了预期的操作:
There are lots of ways to sample from vectors and to draw from random distributions in R. For example, the data set you requested could be created like this:
And we can use the tidyverse to show it does what was expected: