同一池的两个随机独特样本
我试图在每个样本中获取两个具有独特元素的样本。也就是说,“第一个”向量上的字符串不能在“第二”矢量中。不幸的是,我总是会得到重复的字符串,我看不到找到一种解决这个问题的方法。我试图使用if-else解决,但没有成功。
编辑:最终输出应为对。首先应该排在第二。唯一会有所不同的是字母。每个字母必须完全出现三遍。我不想重复元素的原因是,当我创建对时,我会得到1_W和1_W之类的对。那不可能发生。
输出应该是:
first: 12_U, 23_U, 6_U, 8_T, 24_T, 22_T, 7_S, 10_S, 19_S, 21_W, 14_W, 2_W
second: 12_W, 23_W, 6_W, 8_S, 24_S, 22_S, 7_T, 10_T, 19_T, 21_U, 14_U, 2_U
编辑2:
我在解释我的需求方面做得非常糟糕。此代码将用于选择我要收集数据的研究的头条新闻。
每个主题都代表着一个有关特定主题的标题,例如全球变暖。有24个主题。每个版本(u,t,s,w)代表真实标题(t)的变化。
我有一个头条银行,总共有96个头条新闻,这些标题在主题和版本方面有所不同。 1_U是主题1的U版本。我想检查参与者为每对选择哪些版本。
我需要
- 选择12个主题;
- 要在同一主题中创建对,以便参与者可以在同一标题的两个版本之间进行选择。
- 参与者需要始终看到:12对(同一主题的2个版本)。
- 我还需要保证他们会看到每个版本的平等比例。 这就是为什么我创建符合此标准的向量“第一”和向量“第二”的原因。
但是,我正在与重复版本配对。因此,我得到的某些对是12_和12_s,当它们应该是12_和任何其他版本(12_U,12_s或12_w)时,因为参与者在主题12和S版本之间进行选择是没有意义的在主题12。
通过创建两个向量,我能够完全得到我想要的东西,除了某些对包含相同标题的事实。
themes <- c(1:24)
set.seed(1)
twelve <- sample(themes, 12)
versions <- c('U', 'T', 'S', 'W')
set.seed(14)
first <- sample(paste(sample(twelve), rep(versions, 3), sep='_'))
second <- sample(paste(sample(twelve), rep(versions, 3), sep='_'))
repeated <- first[first %in% second]
if (is.null(repeated)) {
print(second) #if there are no elements in the vector "repeated", then print repeated
} else {
x <- sample(paste(sample(twelve), rep(versions, 3), sep='_')) #otherwise, pick another sample
}
I am trying to get two samples with unique elements in each sample. That is, the strings on the "first" vector cannot be in the "second" vector. Unfortunately, I always get repeated strings and I can't see to find a way of solving this. I tried to solve using if-else, but with no success.
edit: the final output should be pairs. The same numbers in first should be in second. The only thing that will vary is the letters. Each letter have to appear exactly three times. The reason I don't want repeated elements, is that when I am creating the pairs, I get pairs such as 1_W and 1_W. That cannot happen.
The output should be something like:
first: 12_U, 23_U, 6_U, 8_T, 24_T, 22_T, 7_S, 10_S, 19_S, 21_W, 14_W, 2_W
second: 12_W, 23_W, 6_W, 8_S, 24_S, 22_S, 7_T, 10_T, 19_T, 21_U, 14_U, 2_U
Edit 2:
I did a terrible job at explaining what I need. This code is going to be used to select headlines for a study I'm going to collect data.
Each theme represents a headline about a specific topic, such as global warming. There are 24 themes. Each version (U, T, S, W) represents variations of a true headline (T).
I have a headlines bank with a total of 96 headlines that varies in terms of themes and versions. 1_U is the U version of theme 1. I want to check which versions participants will choose for each pair.
What I need is
- to select 12 themes;
- to create pairs within the same theme so participants can choose between two versions of the same headline.
- participants need to see always: 12 pairs (2 versions of the same theme).
- I also need to guarantee that they will see equal proportions of each version. That's why I created vector “first” and vector “second” that meet this criteria.
However I am getting pairs with repeated versions. Therefore, some pairs I am getting is 12_S and 12_S, when they should be 12_S and any other version (12_U, 12_S or 12_W) because it does not make sense for a participant to choose between the S version of theme 12 and the S version of theme 12.
By creating two vectors I was able to get exactly what I wanted except for the fact that some pairs contain the same headline.
themes <- c(1:24)
set.seed(1)
twelve <- sample(themes, 12)
versions <- c('U', 'T', 'S', 'W')
set.seed(14)
first <- sample(paste(sample(twelve), rep(versions, 3), sep='_'))
second <- sample(paste(sample(twelve), rep(versions, 3), sep='_'))
repeated <- first[first %in% second]
if (is.null(repeated)) {
print(second) #if there are no elements in the vector "repeated", then print repeated
} else {
x <- sample(paste(sample(twelve), rep(versions, 3), sep='_')) #otherwise, pick another sample
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
要确保您获得2个矢量
第一个
和second
,其中中的主题是第一个
在second
中不存在在向量内重复主题,否则您必须使用采样来将主题拆分。它创建了2个独特的样本,通过
该样本应仅使用
false
条目返回布尔值。如果您只想在最后2个向量中3个字母,我建议以下内容:
rep(版本,3)
的使用是不必要的,因为r
会自动复制,如果一个向量为短。要获得具有保留这些属性的更改主题的新向量,您必须再次将主题分为两组。
编辑1:回答更新的问题。
为了生成一个主题样本:
要使两个向量之间的版本是随机的,并且在两个向量之间进行了不同,以下“ hacky”解决方案浮现在脑海。
以上创建一个样本,然后连续重新创建第二个样本,直到版本不再重复元素为止。
剩下的就是
根据需要获得最终的向量。
To make sure you get 2 vectors
first
andsecond
where themes infirst
do not exist insecond
you either need repeated themes within a vector, or you must use sampling to split the themes up.Which creates 2 unique samples, verified by
Which should return a boolean vector with only
FALSE
entries.If you only want 3 letters in the final 2 vectors I suggest the following:
The usage of
rep(versions, 3)
is unnecessary, asR
automatically replicates if one vector is shorter.To get new vectors with changing themes that preserve these properties, you must split themes again into 2 sets.
Edit 1: In response to the updated question.
To generate one sample of themes:
To get the versions to be random and different between the two vectors, the following "hacky" solution came to mind.
The above creates one sample, then continuously recreates a second sample until versions are no longer repeated elementwise.
All that is left is to get the final vectors
As required.
我认为您使自己的生活更容易采样(无重复),然后以主题价值粘贴。因此,我们首先采样了12个主题,然后在该列表上申请并将其粘贴到您的一对版本中。您将获得带有2行的矩阵,并带有两对。
I think you make your life easier to sample your pairs (with no duplicates) and then paste with your theme value. So we first sample 12 themes, then apply over that list and paste it with your pair of versions. You get a matrix with 2 rows with your pairs.
这里是一种蛮力的方法。我将为12个参与者选择的两个主题创建两个
samp
les。示例
版本
以相同的方式。重复
,直到每个参与者都没有dupe(即在结果矩阵的每一行中)。接下来,使用samp_vs
每两次复制行,baste
都使用map
一起复制。将其包装在函数samp_fun
中。用法
结果
给出了两组的列表。
如果您想要
第一个
,第二个在工作区中的
,请使用list2env
。注意: r&gt; = 4.1使用。
Here a brute force approach. I would create two
samp
les for two themes the 12 participants choose from.sample
theversions
in the same way.repeat
until there is no dupe for each participant in both (i.e. in each row of the resulting matrices). Next, copy rows ofsamp_vs
each two times andpaste
both together usingMap
. Wrap it in a functionsamp_fun
.Usage
Result
Gives a list with the two groups.
If you want
first
,second
in workspace, uselist2env
.Note: R >= 4.1 used.