有效地更改R中矩阵/数组中的单个元素

发布于 2025-01-21 14:19:40 字数 1119 浏览 0 评论 0原文

我正在使用R进行模拟，我正在尝试提高效率。

一点点背景：这是一个抽象模拟，用于测试突变对人群的影响。人口有n个个体，每个个体都有m字母的基因型，每个字母可以是二十个氨基酸之一（我表示为0:19）。

最（计算）昂贵的任务之一是使用M行和N列的矩阵“垫子”，最初是所有零的矩阵，

mat <- matrix(rep(0,M*N),nrow=M)

然后在每个人的基因型中更改（突变）至少一个字母。我至少说的原因是，我希望设定一个突变速率（mutrate），如果我在整体仿真函数中设置为2个，它将在每个个体的矩阵中引起2个突变。

我找到了两种相当昂贵的方法来这样做。如下所示，只有第二种方法包含突变速率参数变种（我不容易考虑如何将其纳入第一个）。

   #method 1
   for(i in 1:N){
   position <- floor(runif(N, min=0, max=M))
   letter <- floor(runif(N, min=0, max=19))
   mat[position[i],i] = letter[i]}
   #method 2, somewhat faster and incorporates mutation rate
mat <- apply(mat,2,function(x) (x+sample(c(rep(0,M-mutrate),sample(0:19,size=mutrate))%%20))))

第二种方法包含模量，因为正如我提到的，基因型值必须在0到19之间。

为了清楚起见，一些其他笔记：

我并不严格需要每个人都能获得完全相同的突变量。话虽这么说，分布应该足够窄，以便，如果Mutrate = 2，大多数人会得到两个突变，有些突变，有些也许是三个。但是，我不希望一个人得到大量的突变，许多人没有突变，有些突变会将字母更改为同一字母，因此，对于较大的人口n，预期的平均突变数为稍有小于指定的弹药。
我相信答案与使用方形支架子集方法的能力有关，从矩阵垫的每一列获取一个随机元素。但是，我找不到有关如何使用语法将一个随机元素与矩阵的每一列隔离的任何信息。 MAT [示例（1：M），示例（1：n）]显然会给您整个矩阵...也许我在这里丢失了一些愚蠢的东西。

任何帮助将不胜感激！

原文

I am running a simulation in R, which I am trying to make more efficient.

A little bit of background: this is an abstract simulation to test the effects of mutation on a population. The population has N individuals and each individuals has a genotype of M letters, each letter can be one of the twenty amino acids (I denote as 0:19).

One of the most (computationally) expensive tasks involves taking a matrix "mat" with M rows and N columns, which initially starts as a matrix of all zeroes,

mat <- matrix(rep(0,M*N),nrow=M)

And then changing (mutating) at least one letter in the genotype of each individual. The reason I say at least is, I would ideally like to set a mutation rate (mutrate) that, if I set to 2 in my overall simulation function, it will cause 2 mutations in the matrix per individual.

I found two rather computationally expensive ways to do so. As you can see below, only the second method incorporates the mutation rate parameter mutrate (I could not easily of think how to incorporate it into the first).

   #method 1
   for(i in 1:N){
   position <- floor(runif(N, min=0, max=M))
   letter <- floor(runif(N, min=0, max=19))
   mat[position[i],i] = letter[i]}
   #method 2, somewhat faster and incorporates mutation rate
mat <- apply(mat,2,function(x) (x+sample(c(rep(0,M-mutrate),sample(0:19,size=mutrate))%%20))))

The second method incorporates a modulus because genotype values have to be between 0 and 19 as I mentioned.

A few additional notes for clarity:

I don't strictly need every individual to get exactly the same mutation amount. But that being said, the distribution should be narrow enough such that, if mutrate = 2, most individuals get two mutations, some one, some maybe three. I don't want however one individual getting a huge amount of mutations and many individuals getting no mutations Notably, some mutations will change the letter into the same letter, and so for a large population size N, the expected average number of mutations is slightly less than the assigned mutrate.
I believe the answer has something to do with the ability to use the square-bracket subsetting method to obtain one random element from every column of the matrix mat. However, I could not find any information about how to use the syntax to isolate one random element from every column of a matrix. mat[sample(1:M),sample(1:N)] obviously gives you the whole matrix... perhaps I am missing something stupidly clear here.

Any help is greatly appreciated !

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柏林苍穹下 2025-01-28 14:19:40

首先回答您的最后一个问题；您可以使用MAT [行，列]或通过其顺序单元格ID访问矩阵中的单个单元格。单元1,1是第一个单元格，其次是2,1，3,1等：

mat <- matrix(rep(0, 5*5), nrow=5)
mat[c(1,3,5,7,9)] = c(1,2,3,4,5)

mat
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    4    0    0    0
[3,]    2    0    0    0    0
[4,]    0    5    0    0    0
[5,]    3    0    0    0    0

访问/覆盖单个单元格很快但是也是如此。我能想到的执行您任务的最快方法是首先为我们想要的值创建向量。所有列索引的向量（每个列与mutrate>），行索引的向量（随机）和这些列/行组合的新值的向量（随机）。

cols = rep(seq_len(N), mutrate)
rows = sample(M, N*mutrate, replace = T)
values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed

for(i in seq_len(N*mutrate)) {
  mat[rows[i],cols[i]] = values[i]
}

我们还可以计算细胞IDS，而不是更新矩阵的循环循环，以便我们可以一次更新所有矩阵单元格：

cols = rep(seq_len(N), mutrate)
rows = sample(M, N*mutrate, replace = T)
cellid = rows + (cols-1)*M
  
values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed
  
mat[cellid] = values

使用6000x10000矩阵尝试对多种方法进行基准测试，显示每种方法的速度：

N = 6000  # individuals
M = 10000 # genotype length

genotypes = 20
mutrate = 2

method1 <- function() {
  
  mat <- matrix(rep(0,M*N),nrow=M)
  
  for(i in 1:(N*mutrate)){
    position <- sample(M, 1)
    letter <- sample(genotypes, 1) - 1
    mat[position,i%%N] = letter
  }
  
  return(mat)
  
}

method2 <- function() {
  
  mat <- matrix(rep(0,M*N),nrow=M)
  mat <- apply(mat,2,function(x) (x+sample(c(rep(0,M-mutrate),sample(0:19,size=mutrate))%%20)))
  
}

method3 <- function() {
  
  mat <- matrix(rep(0,M*N),nrow=M)
  
  cols = rep(seq_len(N), mutrate)
  rows = sample(M, N*mutrate, replace = T)
  values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed
  
  for(i in seq_len(N*mutrate)) {
    mat[rows[i],cols[i]] = values[i]
  }
  
  return(mat)
  
}

method4 <- function() {
  
  mat <- matrix(rep(0,M*N),nrow=M)
  
  cols = rep(seq_len(N), mutrate)
  rows = sample(M, N*mutrate, replace = T)
  cellid = rows + (cols-1)*M
  
  values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed
  
  mat[cellid] = values
  
  return(mat)
  
}

benchmark <- function(func, times=10) {
  begin <- as.numeric(Sys.time())
  for(i in seq_len(times))
    retval <- eval(parse(text=func))
  end <- as.numeric(Sys.time())
  cat(func, 'took', (end-begin)/times, 'seconds\n')
  return(retval)
}

ret1 <- benchmark('method1()')
ret2 <- benchmark('method2()')
ret3 <- benchmark('method3()')
ret4 <- benchmark('method4()')

I've modified your first method to speed it up and perform mutrate.

method1() took 0.8936087 seconds
method2() took 8.767686 seconds
method3() took 0.7008878 seconds
method4() took 0.6548331 seconds

To answer your last question first; you can access a single cell in a matrix with mat[row,column], or multiple scattered cells by their sequential cell id. Cell 1,1 is the first cell, followed by 2,1, 3,1, etc:

mat <- matrix(rep(0, 5*5), nrow=5)
mat[c(1,3,5,7,9)] = c(1,2,3,4,5)

mat
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    4    0    0    0
[3,]    2    0    0    0    0
[4,]    0    5    0    0    0
[5,]    3    0    0    0    0

Accessing / overwriting the individual cells is fast too however. The fastest way that I could think of to perform your task, is to first create vectors for the values we want. A vector of all column indices (every column as many times as mutrate), a vector of row indices (randomly), and a vector of new values for these column/row combinations (randomly).

cols = rep(seq_len(N), mutrate)
rows = sample(M, N*mutrate, replace = T)
values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed

for(i in seq_len(N*mutrate)) {
  mat[rows[i],cols[i]] = values[i]
}

Instead of that for-loop to update the matrix, we can also calculate the cell-IDs so we can update all matrix cells in one go:

cols = rep(seq_len(N), mutrate)
rows = sample(M, N*mutrate, replace = T)
cellid = rows + (cols-1)*M
  
values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed
  
mat[cellid] = values

Trying with a 6000x10000 matrix to benchmark the multiple methods, shows how fast each method is:

N = 6000  # individuals
M = 10000 # genotype length

genotypes = 20
mutrate = 2

method1 <- function() {
  
  mat <- matrix(rep(0,M*N),nrow=M)
  
  for(i in 1:(N*mutrate)){
    position <- sample(M, 1)
    letter <- sample(genotypes, 1) - 1
    mat[position,i%%N] = letter
  }
  
  return(mat)
  
}

method2 <- function() {
  
  mat <- matrix(rep(0,M*N),nrow=M)
  mat <- apply(mat,2,function(x) (x+sample(c(rep(0,M-mutrate),sample(0:19,size=mutrate))%%20)))
  
}

method3 <- function() {
  
  mat <- matrix(rep(0,M*N),nrow=M)
  
  cols = rep(seq_len(N), mutrate)
  rows = sample(M, N*mutrate, replace = T)
  values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed
  
  for(i in seq_len(N*mutrate)) {
    mat[rows[i],cols[i]] = values[i]
  }
  
  return(mat)
  
}

method4 <- function() {
  
  mat <- matrix(rep(0,M*N),nrow=M)
  
  cols = rep(seq_len(N), mutrate)
  rows = sample(M, N*mutrate, replace = T)
  cellid = rows + (cols-1)*M
  
  values = sample(genotypes, N*mutrate, replace = T) - 1 # -1 offset since genotypes are 0-indexed
  
  mat[cellid] = values
  
  return(mat)
  
}

benchmark <- function(func, times=10) {
  begin <- as.numeric(Sys.time())
  for(i in seq_len(times))
    retval <- eval(parse(text=func))
  end <- as.numeric(Sys.time())
  cat(func, 'took', (end-begin)/times, 'seconds\n')
  return(retval)
}

ret1 <- benchmark('method1()')
ret2 <- benchmark('method2()')
ret3 <- benchmark('method3()')
ret4 <- benchmark('method4()')

I've modified your first method to speed it up and perform mutrate.

method1() took 0.8936087 seconds
method2() took 8.767686 seconds
method3() took 0.7008878 seconds
method4() took 0.6548331 seconds

回复收藏 0 原文

~没有更多了~

关于作者

孤独患者

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

有效地更改R中矩阵/数组中的单个元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

动次打次papapa

我是有多爱你

原来分手还会想你

linces

霓裳挽歌倾城醉

玍銹的英雄夢

友情链接

有效地更改R中矩阵/数组中的单个元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

动次打次papapa

我是有多爱你

原来分手还会想你

linces

霓裳挽歌倾城醉

玍銹的英雄夢

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。