在一个样本中随机和非随机采样
有没有一种方法可以在单个样本中采样X X数量的随机行和X非随机行? 例如,我想获得1,000个4行iris
的样本。我想随机采样3行iris
,第四行将与每个示例中的一个相同(这是模仿混合采样设计)。
我可以采样3个随机行1000x和固定行1000x,然后将两个数据帧合并在一起,但由于某些原因,这不是理想的情况。执行此操作的代码看起来如下:
df<- iris
fixed_sample<- iris[7,]
random<- list()
fixed<- list()
counter<- 0
for (i in 1:1000) {
# sample 4 randomly selected transects 100 time
tempsample_random<- df[sample(1:nrow(df), 3, replace=F),]
tempsample_fixed<- fixed_sample[sample(1:nrow(fixed_sample), 1, replace=F), ]
random[[i]]=tempsample_random
fixed[[i]]=tempsample_fixed
counter<- counter+1
print(counter)
}
random_results<- do.call(rbind, random)
fixed_results<- do.call(rbind, fixed)
从这里,我将制作一个新列作为分组变量,然后根据该组将它们合并在一起。因此,最终数据框架的每四个行都有3个随机行,每个示例中的行数(<代码> fixed_sample )。
我研究了使用splitStackShape :: strapified
,但还没有按照我需要的方式工作。我将在几个级别的采样工作中(样本2、3、4、5行,等等1,000倍)进行此操作,因此能够从同一样本中从同一样本中拉出固定和随机行是理想的开始。
任何帮助将不胜感激。
Is there a way to sample X number of random rows and X non-random rows in a single sample?
For example, I want to get 1,000 samples of 4 rows of iris
. I want to randomly sample 3 rows of iris
and the fourth row will be the same one in each sample (this is to mimic a hybrid sampling design).
I can sample 3 random rows 1000x and the fixed row 1000x and then merge the two data frames together, but for a few reasons this is not an ideal situation. The code to do that looks something like the following:
df<- iris
fixed_sample<- iris[7,]
random<- list()
fixed<- list()
counter<- 0
for (i in 1:1000) {
# sample 4 randomly selected transects 100 time
tempsample_random<- df[sample(1:nrow(df), 3, replace=F),]
tempsample_fixed<- fixed_sample[sample(1:nrow(fixed_sample), 1, replace=F), ]
random[[i]]=tempsample_random
fixed[[i]]=tempsample_fixed
counter<- counter+1
print(counter)
}
random_results<- do.call(rbind, random)
fixed_results<- do.call(rbind, fixed)
From here I would make a new column as a grouping variable and then merge them together based on that group. So every four rows of the final data frame has 3 random rows and row number 7 (fixed_sample
) in each sample.
I've looked into using splitstackshape::stratified
, but haven't gotten it to work the way I need it to. I'll be doing this over several levels of sampling effort (sample 2, 3, 4, 5 rows, etc. 1,000x each) so it would be ideal to be able to pull the fixed and random rows in the same sample from the beginning.
Any help would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为您可以使用
lapply
在一行中进行此操作。在这种情况下,我们将绘制3个样本,但是您可以将seq(3)
更改为seq(1000)
以获取1000个样本。我遵循了您的示例,并选择了第7行作为固定行。由
I think you can do this in a single line using
lapply
. In this case we will draw 3 samples, but you can changeseq(3)
toseq(1000)
to get your 1000 samples. I have followed your example and selected row 7 as the fixed row.Created on 2022-05-18 by the reprex package (v2.0.1)
这是一种方法:
目的是我们采样所有行 您打算在所有样本中包含的固定行,然后将其预先列入行索引列表。使用
setDiff(..,fixed_row)的前提EM>带有所需最终结果的行索引。
(请注意,
set.seed
的使用仅是在stackoverflow上的可重复性,您可能不应在生产中使用它。)Here's a method:
The intent is that we sample all rows except the fixed row that you intend to include in all samples, then prepend it to the list of row indices. Using the premise of
setdiff(.., fixed_row)
allows you to use arbitrary sets here, so it would be feasible forfixed_row
to have zero or more row indices with the desired end result.(Note that the use of
set.seed
is just for reproducibility here on StackOverflow, you should likely not use that in production.)