如何从相关矩阵中生成随机相关的统一数据?

发布于 2025-01-28 01:14:54 字数 2837 浏览 2 评论 0原文

我有一个非常具体的问题要解决,这使得研究解决方案非常努力,因为我缺乏必要的数学技能。

我的目标是:给定A 协方差/相关矩阵可变范围,生成一些随机数据。该数据需要满足3个重要条件:

  • 该数据的协方差/相关性应与提供的协方差/相关矩阵相似。

  • 该数据的变量的范围(列)应由提供的范围界定。

  • 每个变量具有均匀的分布。

也许有一个可以使用所提供的参数来生成此数据条件的R软件包或功能?


EDIT1:

如果无法满足均匀性(条件3),也许R软件包或函数可以生成符合条件1和2 的数据? 换句话说,我不在乎变量的分布。


edit2:

这是我第一次对此问题的非常糟糕的尝试。到目前为止,一切都创建了正相关和统一的数据。测试在底部:

generate_correlated_variables <- function(variable_ranges, numPoints = 100, nbins = 10) {
  
  df <- matrix(0, nrow = numPoints, ncol = length(variable_ranges))
  colnames(df) <- names(variable_ranges)

  
  for (i in 1:length(variable_ranges)) {
    
    df[,i] <- runif(numPoints, min = as.numeric(variable_ranges[[i]][1]), max = as.numeric(variable_ranges[[i]][2]))  
    
  }
  
  #Sample one variable and determine how many points fall in each bin
  #These amounts will be used to sample the rest of the variables
  df[,1] <- runif(numPoints, min = as.numeric(variable_ranges[[1]][1]), max = as.numeric(variable_ranges[[1]][2]))
  bin_width <- (variable_ranges[[1]][2] - variable_ranges[[1]][1])/nbins
  breaks_vec <- seq(variable_ranges[[1]][1], variable_ranges[[1]][2], by = bin_width)
  table <- table(cut(df[,1], breaks = breaks_vec, include.lowest = TRUE))

  binned_ranges_list <- vector(mode = "list", length = length(variable_ranges))
  names(binned_ranges_list) <- names(variable_ranges)
  
  temp <- vector(mode = "list", length = nbins)
  
  
  for (i in 1:length(variable_ranges)) {

      bin_width <- (variable_ranges[[i]][2] - variable_ranges[[i]][1])/nbins
      
      breaks_vec <- seq(variable_ranges[[i]][1], variable_ranges[[i]][2], by = bin_width)
      
      for (j in 1:nbins) {
        
        temp[[j]][1] <- breaks_vec[j]
        temp[[j]][2] <- breaks_vec[j+1]
        
      }
      
      binned_ranges_list[[i]] <- temp
      
  }
  
  print(binned_ranges_list)
    
  #sample ranges
  for (i in 1:length(variable_ranges)) {
    
    sampled_values_vec <- c()
      
      for (j in 1:nbins) {
        
        sample <- runif(n = table[j], min = binned_ranges_list[[i]][[j]][1], max = binned_ranges_list[[i]][[j]][2])
        
        sampled_values_vec <- c(sampled_values_vec, sample)
        
      }
    
    df[,i] <- sampled_values_vec
    }
   return(df) 
  }
  

#Tests
variable_ranges = list(A = c(1, 100), B = c(50, 100), C = c(1, 10))

a <- generate_correlated_variables(variable_ranges = variable_ranges, numPoints = 100, nbins = 2)
cor(a)

b <- generate_correlated_variables(variable_ranges = variable_ranges, numPoints = 100, nbins = 50)
cor(b)

I have a very specific problem to solve that makes researching a solution quite hard because I lack the requisite math skills.

My goal: Given a covariance/correlation matrix and variable ranges, generate some random data. This data needs to meet 3 important conditions:

  • The covariance/correlation of this data should be similar to the provided covariance/correlation matrix.

  • The ranges of the variables of this data (columns) should be bounded by the provided ranges.

  • Each variable has a uniform distribution.

Is there perhaps an R package or function that can generate this data conditions using those provided arguments? Maybe code in some other language that I could then rewrite in R?


EDIT1:

In the case that uniformity (condition 3) cannot be met, is there perhaps an R package or function that can generate data that meets just conditions 1 and 2? In other words, I don't care what distribution the variables take.


EDIT2:

Here is my first very terrible attempt at this problem. All it does so far is create positively correlated and uniform data. Tests are at the bottom:

generate_correlated_variables <- function(variable_ranges, numPoints = 100, nbins = 10) {
  
  df <- matrix(0, nrow = numPoints, ncol = length(variable_ranges))
  colnames(df) <- names(variable_ranges)

  
  for (i in 1:length(variable_ranges)) {
    
    df[,i] <- runif(numPoints, min = as.numeric(variable_ranges[[i]][1]), max = as.numeric(variable_ranges[[i]][2]))  
    
  }
  
  #Sample one variable and determine how many points fall in each bin
  #These amounts will be used to sample the rest of the variables
  df[,1] <- runif(numPoints, min = as.numeric(variable_ranges[[1]][1]), max = as.numeric(variable_ranges[[1]][2]))
  bin_width <- (variable_ranges[[1]][2] - variable_ranges[[1]][1])/nbins
  breaks_vec <- seq(variable_ranges[[1]][1], variable_ranges[[1]][2], by = bin_width)
  table <- table(cut(df[,1], breaks = breaks_vec, include.lowest = TRUE))

  binned_ranges_list <- vector(mode = "list", length = length(variable_ranges))
  names(binned_ranges_list) <- names(variable_ranges)
  
  temp <- vector(mode = "list", length = nbins)
  
  
  for (i in 1:length(variable_ranges)) {

      bin_width <- (variable_ranges[[i]][2] - variable_ranges[[i]][1])/nbins
      
      breaks_vec <- seq(variable_ranges[[i]][1], variable_ranges[[i]][2], by = bin_width)
      
      for (j in 1:nbins) {
        
        temp[[j]][1] <- breaks_vec[j]
        temp[[j]][2] <- breaks_vec[j+1]
        
      }
      
      binned_ranges_list[[i]] <- temp
      
  }
  
  print(binned_ranges_list)
    
  #sample ranges
  for (i in 1:length(variable_ranges)) {
    
    sampled_values_vec <- c()
      
      for (j in 1:nbins) {
        
        sample <- runif(n = table[j], min = binned_ranges_list[[i]][[j]][1], max = binned_ranges_list[[i]][[j]][2])
        
        sampled_values_vec <- c(sampled_values_vec, sample)
        
      }
    
    df[,i] <- sampled_values_vec
    }
   return(df) 
  }
  

#Tests
variable_ranges = list(A = c(1, 100), B = c(50, 100), C = c(1, 10))

a <- generate_correlated_variables(variable_ranges = variable_ranges, numPoints = 100, nbins = 2)
cor(a)

b <- generate_correlated_variables(variable_ranges = variable_ranges, numPoints = 100, nbins = 50)
cor(b)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

青萝楚歌 2025-02-04 01:14:54

这是如何获得相关的统一随机数的想法。

假设您有独立位的来源

  1. 首先生成数组x位(例如2位)。

  2. 然后生成另一个随机阵列,上面(中间,下,某个位置...)位从步骤1替换为

  3. 再次生成另一个随机数组(中间,下,下,某个位置...)位,从步骤1替换。

步骤2和3的阵列是统一的,但相关。

插图的代码(对不起,Python)

import numpy as np

N=1000000

rng = np.random.default_rng()

m = np.empty(N, dtype=np.uint32); m.fill(2*1073741824-1) # mask 2^31-1

f = rng.integers(low = 0, high=4294967295, size=N, dtype=np.uint32, endpoint=True)
f = f - np.bitwise_and(f, m) # upper three bits

q = rng.integers(low = 0, high=4294967295, size=N, dtype=np.uint32, endpoint=True)
z = rng.integers(low = 0, high=4294967295, size=N, dtype=np.uint32, endpoint=True)

print("Uncorrelated")
print(np.corrcoef([q, z]))

q = f + np.bitwise_and(m, q)
z = f + np.bitwise_and(m, z)

print("Correlated")
print(np.corrcoef([q, z]))

Here is the idea how to get correlated uniform random numbers.

Suppose you have source of independent bits

  1. First generate array X bits (say 2 bits).

  2. Then generate another random array with upper (middle, lower, some position...) bits replaced from step 1.

  3. Again generate another random array with upper (middle, lower, some position...) bits replaced from step 1.

Arrays from step 2 and 3 would be uniform, but correlated.

Code for illustration (sorry, Python)

import numpy as np

N=1000000

rng = np.random.default_rng()

m = np.empty(N, dtype=np.uint32); m.fill(2*1073741824-1) # mask 2^31-1

f = rng.integers(low = 0, high=4294967295, size=N, dtype=np.uint32, endpoint=True)
f = f - np.bitwise_and(f, m) # upper three bits

q = rng.integers(low = 0, high=4294967295, size=N, dtype=np.uint32, endpoint=True)
z = rng.integers(low = 0, high=4294967295, size=N, dtype=np.uint32, endpoint=True)

print("Uncorrelated")
print(np.corrcoef([q, z]))

q = f + np.bitwise_and(m, q)
z = f + np.bitwise_and(m, z)

print("Correlated")
print(np.corrcoef([q, z]))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文