同一池的两个随机独特样本

发布于 2025-01-23 00:38:43 字数 1550 浏览 2 评论 0原文

我试图在每个样本中获取两个具有独特元素的样本。也就是说,“第一个”向量上的字符串不能在“第二”矢量中。不幸的是,我总是会得到重复的字符串,我看不到找到一种解决这个问题的方法。我试图使用if-else解决,但没有成功。

编辑:最终输出应为对。首先应该排在第二。唯一会有所不同的是字母。每个字母必须完全出现三遍。我不想重复元素的原因是,当我创建对时,我会得到1_W和1_W之类的对。那不可能发生。

输出应该是:

first: 12_U, 23_U, 6_U, 8_T, 24_T, 22_T, 7_S, 10_S, 19_S, 21_W, 14_W, 2_W

second: 12_W, 23_W, 6_W, 8_S, 24_S, 22_S, 7_T, 10_T, 19_T, 21_U, 14_U, 2_U

编辑2:

我在解释我的需求方面做得非常糟糕。此代码将用于选择我要收集数据的研究的头条新闻。

每个主题都代表着一个有关特定主题的标题,例如全球变暖。有24个主题。每个版本(u,t,s,w)代表真实标题(t)的变化。

我有一个头条银行,总共有96个头条新闻,这些标题在主题和版本方面有所不同。 1_U是主题1的U版本。我想检查参与者为每对选择哪些版本。

我需要

  1. 选择12个主题;
  2. 要在同一主题中创建对,以便参与者可以在同一标题的两个版本之间进行选择。
  3. 参与者需要始终看到:12对(同一主题的2个版本)。
  4. 我还需要保证他们会看到每个版本的平等比例。 这就是为什么我创建符合此标准的向量“第一”和向量“第二”的原因。

但是,我正在与重复版本配对。因此,我得到的某些对是12_和12_s,当它们应该是12_和任何其他版本(12_U,12_s或12_w)时,因为参与者在主题12和S版本之间进行选择是没有意义的在主题12。

通过创建两个向量,我能够完全得到我想要的东西,除了某些对包含相同标题的事实。

themes <- c(1:24)
set.seed(1)
twelve <- sample(themes, 12)
versions <- c('U', 'T', 'S', 'W')

set.seed(14) 
first <- sample(paste(sample(twelve), rep(versions, 3), sep='_'))
second <- sample(paste(sample(twelve), rep(versions, 3), sep='_'))

repeated <- first[first %in% second]

if (is.null(repeated)) {
  print(second) #if there are no elements in the vector "repeated", then print repeated
} else {
  x <- sample(paste(sample(twelve), rep(versions, 3), sep='_')) #otherwise, pick another sample
}

I am trying to get two samples with unique elements in each sample. That is, the strings on the "first" vector cannot be in the "second" vector. Unfortunately, I always get repeated strings and I can't see to find a way of solving this. I tried to solve using if-else, but with no success.

edit: the final output should be pairs. The same numbers in first should be in second. The only thing that will vary is the letters. Each letter have to appear exactly three times. The reason I don't want repeated elements, is that when I am creating the pairs, I get pairs such as 1_W and 1_W. That cannot happen.

The output should be something like:

first: 12_U, 23_U, 6_U, 8_T, 24_T, 22_T, 7_S, 10_S, 19_S, 21_W, 14_W, 2_W

second: 12_W, 23_W, 6_W, 8_S, 24_S, 22_S, 7_T, 10_T, 19_T, 21_U, 14_U, 2_U

Edit 2:

I did a terrible job at explaining what I need. This code is going to be used to select headlines for a study I'm going to collect data.

Each theme represents a headline about a specific topic, such as global warming. There are 24 themes. Each version (U, T, S, W) represents variations of a true headline (T).

I have a headlines bank with a total of 96 headlines that varies in terms of themes and versions. 1_U is the U version of theme 1. I want to check which versions participants will choose for each pair.

What I need is

  1. to select 12 themes;
  2. to create pairs within the same theme so participants can choose between two versions of the same headline.
  3. participants need to see always: 12 pairs (2 versions of the same theme).
  4. I also need to guarantee that they will see equal proportions of each version. That's why I created vector “first” and vector “second” that meet this criteria.

However I am getting pairs with repeated versions. Therefore, some pairs I am getting is 12_S and 12_S, when they should be 12_S and any other version (12_U, 12_S or 12_W) because it does not make sense for a participant to choose between the S version of theme 12 and the S version of theme 12.

By creating two vectors I was able to get exactly what I wanted except for the fact that some pairs contain the same headline.

themes <- c(1:24)
set.seed(1)
twelve <- sample(themes, 12)
versions <- c('U', 'T', 'S', 'W')

set.seed(14) 
first <- sample(paste(sample(twelve), rep(versions, 3), sep='_'))
second <- sample(paste(sample(twelve), rep(versions, 3), sep='_'))

repeated <- first[first %in% second]

if (is.null(repeated)) {
  print(second) #if there are no elements in the vector "repeated", then print repeated
} else {
  x <- sample(paste(sample(twelve), rep(versions, 3), sep='_')) #otherwise, pick another sample
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

小巷里的女流氓 2025-01-30 00:38:43

要确保您获得2个矢量第一个second,其中中的主题是第一个second中不存在在向量内重复主题,否则您必须使用采样来将主题拆分。

set.seed(1)
themes <- 1:24
versions <- c('U', 'T', 'S', 'W')
split_idx <- sample(length(themes), 0.5*length(themes))
set_1 <- themes[split_idx]
set_2 <- themes[-split_idx]

它创建了2个独特的样本,通过

set_1 %in% set_2

该样本应仅使用false条目返回布尔值。

如果您只想在最后2个向量中3个字母,我建议以下内容:

first <- paste(sample(set_1), sample(versions, 3), sep = "_")
secnd <- paste(sample(set_2), sample(versions, 3), sep = "_")

rep(版本,3)的使用是不必要的,因为r会自动复制,如果一个向量为短。

要获得具有保留这些属性的更改主题的新向量,您必须再次将主题分为两组。

编辑1:回答更新的问题。

为了生成一个主题样本:

set.seed(1)
themes <- 1:24
versions <- c('U', 'T', 'S', 'W')
theme_sample <- sample(themes, 12)

要使两个向量之间的版本是随机的,并且在两个向量之间进行了不同,以下“ hacky”解决方案浮现在脑海。

first_versions <- sample(versions)
while(sum((second_versions <- sample(versions)) == first_versions) != 0){}

以上创建一个样本,然后连续重新创建第二个样本,直到版本不再重复元素为止。
剩下的就是

first <- paste(theme_sample, first_versions, sep = "_")
second <- paste(theme_sample, second_versions, sep = "_")

根据需要获得最终的向量。

To make sure you get 2 vectors first and second where themes in first do not exist in second you either need repeated themes within a vector, or you must use sampling to split the themes up.

set.seed(1)
themes <- 1:24
versions <- c('U', 'T', 'S', 'W')
split_idx <- sample(length(themes), 0.5*length(themes))
set_1 <- themes[split_idx]
set_2 <- themes[-split_idx]

Which creates 2 unique samples, verified by

set_1 %in% set_2

Which should return a boolean vector with only FALSE entries.

If you only want 3 letters in the final 2 vectors I suggest the following:

first <- paste(sample(set_1), sample(versions, 3), sep = "_")
secnd <- paste(sample(set_2), sample(versions, 3), sep = "_")

The usage of rep(versions, 3) is unnecessary, as R automatically replicates if one vector is shorter.

To get new vectors with changing themes that preserve these properties, you must split themes again into 2 sets.

Edit 1: In response to the updated question.

To generate one sample of themes:

set.seed(1)
themes <- 1:24
versions <- c('U', 'T', 'S', 'W')
theme_sample <- sample(themes, 12)

To get the versions to be random and different between the two vectors, the following "hacky" solution came to mind.

first_versions <- sample(versions)
while(sum((second_versions <- sample(versions)) == first_versions) != 0){}

The above creates one sample, then continuously recreates a second sample until versions are no longer repeated elementwise.
All that is left is to get the final vectors

first <- paste(theme_sample, first_versions, sep = "_")
second <- paste(theme_sample, second_versions, sep = "_")

As required.

山人契 2025-01-30 00:38:43

我认为您使自己的生活更容易采样(无重复),然后以主题价值粘贴。因此,我们首先采样了12个主题,然后在该列表上申请并将其粘贴到您的一对版本中。您将获得带有2行的矩阵,并带有两对。

set.seed(1)

themes <- 1:24
versions <- c("U", "T", "S", "W")

pairs <- sapply(sample(themes, 12), FUN = function(x) paste(x, sample(versions, 2), sep = "_"))

pairs
#      [,1]  [,2]  [,3]  [,4]  [,5]   [,6]   [,7]   [,8]   [,9]  [,10]  [,11]  [,12]
# [1,] "4_T" "7_S" "1_S" "2_U" "11_U" "14_U" "18_T" "22_T" "5_W" "16_U" "10_T" "6_T"
# [2,] "4_W" "7_U" "1_U" "2_W" "11_T" "14_W" "18_W" "22_U" "5_S" "16_S" "10_W" "6_W"

first <- pairs[1, ]
# [1] "4_T"  "7_S"  "1_S"  "2_U"  "11_U" "14_U" "18_T" "22_T" "5_W"  "16_U" "10_T" "6_T" 

second <- pairs[2, ]
# [1] "4_W"  "7_U"  "1_U"  "2_W"  "11_T" "14_W" "18_W" "22_U" "5_S"  "16_S" "10_W" "6_W"

I think you make your life easier to sample your pairs (with no duplicates) and then paste with your theme value. So we first sample 12 themes, then apply over that list and paste it with your pair of versions. You get a matrix with 2 rows with your pairs.

set.seed(1)

themes <- 1:24
versions <- c("U", "T", "S", "W")

pairs <- sapply(sample(themes, 12), FUN = function(x) paste(x, sample(versions, 2), sep = "_"))

pairs
#      [,1]  [,2]  [,3]  [,4]  [,5]   [,6]   [,7]   [,8]   [,9]  [,10]  [,11]  [,12]
# [1,] "4_T" "7_S" "1_S" "2_U" "11_U" "14_U" "18_T" "22_T" "5_W" "16_U" "10_T" "6_T"
# [2,] "4_W" "7_U" "1_U" "2_W" "11_T" "14_W" "18_W" "22_U" "5_S" "16_S" "10_W" "6_W"

first <- pairs[1, ]
# [1] "4_T"  "7_S"  "1_S"  "2_U"  "11_U" "14_U" "18_T" "22_T" "5_W"  "16_U" "10_T" "6_T" 

second <- pairs[2, ]
# [1] "4_W"  "7_U"  "1_U"  "2_W"  "11_T" "14_W" "18_W" "22_U" "5_S"  "16_S" "10_W" "6_W"
猫腻 2025-01-30 00:38:43

这里是一种蛮力的方法。我将为12个参与者选择的两个主题创建两个samp les。 示例 版本以相同的方式。 重复,直到每个参与者都没有dupe(即在结果矩阵的每一行中)。接下来,使用samp_vs每两次复制行,baste都使用map一起复制。将其包装在函数samp_fun中。

samp_fun <- \(themes, versions) {
  themes_12 <- sample(themes, 12)
  repeat {
    samp_th <- replicate(2, sample(themes_12))
    samp_vs <- replicate(2, sample(versions))
    if (!any(apply(samp_th, 1, duplicated)) &
        !any(apply(samp_vs, 1, duplicated))) break
  }
  samp_vs <- samp_vs[rep(seq_len(nrow(samp_vs)), each=3), ]
  Map(\(...) paste(..., sep='_'),
      as.data.frame(samp_th), as.data.frame(samp_vs)) |>
    setNames(c('first', 'second'))
}

用法

themes <- 1:24
versions <- c('U', 'T', 'S', 'W')

set.seed(42)
res <- samp_fun(themes, versions)

结果

给出了两组的列表。

res$first
# [1] "4_S"  "15_S" "9_S"  "18_T" "5_T"  "20_T"
# [7] "17_W" "24_W" "8_W"  "7_U"  "1_U"  "10_U"

res$second
# [1] "15_U" "4_U"  "10_U" "8_W"  "7_W"  "24_W"
# [7] "5_S"  "18_S" "1_S"  "17_T" "9_T"  "20_T"

如果您想要第一个第二个在工作区中的,请使用list2env

list2env(res, .GlobalEnv)
first
second

注意: r&gt; = 4.1使用。

Here a brute force approach. I would create two samples for two themes the 12 participants choose from. sample the versions in the same way. repeat until there is no dupe for each participant in both (i.e. in each row of the resulting matrices). Next, copy rows of samp_vs each two times and paste both together using Map. Wrap it in a function samp_fun.

samp_fun <- \(themes, versions) {
  themes_12 <- sample(themes, 12)
  repeat {
    samp_th <- replicate(2, sample(themes_12))
    samp_vs <- replicate(2, sample(versions))
    if (!any(apply(samp_th, 1, duplicated)) &
        !any(apply(samp_vs, 1, duplicated))) break
  }
  samp_vs <- samp_vs[rep(seq_len(nrow(samp_vs)), each=3), ]
  Map(\(...) paste(..., sep='_'),
      as.data.frame(samp_th), as.data.frame(samp_vs)) |>
    setNames(c('first', 'second'))
}

Usage

themes <- 1:24
versions <- c('U', 'T', 'S', 'W')

set.seed(42)
res <- samp_fun(themes, versions)

Result

Gives a list with the two groups.

res$first
# [1] "4_S"  "15_S" "9_S"  "18_T" "5_T"  "20_T"
# [7] "17_W" "24_W" "8_W"  "7_U"  "1_U"  "10_U"

res$second
# [1] "15_U" "4_U"  "10_U" "8_W"  "7_W"  "24_W"
# [7] "5_S"  "18_S" "1_S"  "17_T" "9_T"  "20_T"

If you want first, second in workspace, use list2env.

list2env(res, .GlobalEnv)
first
second

Note: R >= 4.1 used.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文