按组对多级数据进行重采样

发布于 2025-01-12 04:25:02 字数 831 浏览 0 评论 0原文

我正在尝试编写一个对嵌套在组中的名称进行重新采样的函数。我的函数适用于不考虑组的重新采样,但我不想创建不属于同一组的名称样本。

这是该函数,其中 x 是所有名称(有些重复)的向量,a 是唯一名称观察值的向量,b 是随机顺序的唯一名称的向量。

    rep <- function(x,a,b){
      for(i in 1:length(a)){
        x1 <- x
        x1[which(x==a[i])] <- b[i]
      }
      x1
    }
x <- c("Smith", "Jones", "Washington", "Miller", "Wells", "Smith", "Smith", "Miller")
a <- sort(unique(x))
b <- sample(a, length(a))

dat <- rep(x, a, b)
View(dat)
"Smith"      "Jones"      "Washington" "Miller"     "Jones"      "Smith"      "Smith"       "Miller" 

但是,每个名称都嵌套在一个组中,因此我需要避免创建不属于同一组的名称样本。例如:

x         groupid
Smith       A1
Jones       B1
Washington  C1
Miller      A2
Wells       B1
Smith       A2
Smith       A3
Miller      A3

我该如何解释这一点?

I am trying to write a function that resamples names nested in groups. My function works for resampling without respect to groups, but I don't want to create samples of names that aren't in the same group.

Here's the function, where x is a vector of all names (some repeated), a is a vector of unique name observations, and b is a vector of unique names in randomized order.

    rep <- function(x,a,b){
      for(i in 1:length(a)){
        x1 <- x
        x1[which(x==a[i])] <- b[i]
      }
      x1
    }
x <- c("Smith", "Jones", "Washington", "Miller", "Wells", "Smith", "Smith", "Miller")
a <- sort(unique(x))
b <- sample(a, length(a))

dat <- rep(x, a, b)
View(dat)
"Smith"      "Jones"      "Washington" "Miller"     "Jones"      "Smith"      "Smith"       "Miller" 

However, each name is nested in a group, so I need to avoid creating samples of names that are not in the same group. For example:

x         groupid
Smith       A1
Jones       B1
Washington  C1
Miller      A2
Wells       B1
Smith       A2
Smith       A3
Miller      A3

How can I account for that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

乱世争霸 2025-01-19 04:25:02

使用 tidyverse 包可以更容易地实现这一点:

library(tidyverse)

txt <- 'x         groupid
Smith       A1
Jones       B1
Washington  C1
Miller      A2
Wells       B1
Smith       A2
Smith       A3
Miller      A3'

df <- read_table(file = txt)

set.seed(0)
df.new <- df %>% 
  group_by(groupid) %>% 
  mutate(
    b = sample(unique(x), n(), replace = T)
  ) %>% 
  arrange(groupid)

  x          groupid b         
  <chr>      <chr>   <chr>     
1 Smith      A1      Smith     
2 Miller     A2      Miller    
3 Smith      A2      Smith     
4 Smith      A3      Smith     
5 Miller     A3      Miller    
6 Jones      B1      Wells     
7 Wells      B1      Jones     
8 Washington C1      Washington

This would be easier to accomplish with the tidyverse packages:

library(tidyverse)

txt <- 'x         groupid
Smith       A1
Jones       B1
Washington  C1
Miller      A2
Wells       B1
Smith       A2
Smith       A3
Miller      A3'

df <- read_table(file = txt)

set.seed(0)
df.new <- df %>% 
  group_by(groupid) %>% 
  mutate(
    b = sample(unique(x), n(), replace = T)
  ) %>% 
  arrange(groupid)

  x          groupid b         
  <chr>      <chr>   <chr>     
1 Smith      A1      Smith     
2 Miller     A2      Miller    
3 Smith      A2      Smith     
4 Smith      A3      Smith     
5 Miller     A3      Miller    
6 Jones      B1      Wells     
7 Wells      B1      Jones     
8 Washington C1      Washington
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文