有效地更改R中矩阵/数组中的单个元素
我正在使用R进行模拟,我正在尝试提高效率。
一点点背景:这是一个抽象模拟,用于测试突变对人群的影响。人口有n个个体,每个个体都有m字母的基因型,每个字母可以是二十个氨基酸之一(我表示为0:19)。
最(计算)昂贵的任务之一是使用M行和N列的矩阵“垫子”,最初是所有零的矩阵,
mat <- matrix(rep(0,M*N),nrow=M)
然后在每个人的基因型中更改(突变)至少一个字母。我至少说的原因是,我希望设定一个突变速率(mutrate),如果我在整体仿真函数中设置为2个,它将在每个个体的矩阵中引起2个突变。
我找到了两种相当昂贵的方法来这样做。如下所示,只有第二种方法包含突变速率参数变种(我不容易考虑如何将其纳入第一个)。
#method 1
for(i in 1:N){
position <- floor(runif(N, min=0, max=M))
letter <- floor(runif(N, min=0, max=19))
mat[position[i],i] = letter[i]}
#method 2, somewhat faster and incorporates mutation rate
mat <- apply(mat,2,function(x) (x+sample(c(rep(0,M-mutrate),sample(0:19,size=mutrate))%%20))))
第二种方法包含模量,因为正如我提到的,基因型值必须在0到19之间。
为了清楚起见,一些其他笔记:
- 我并不严格需要每个人都能获得完全相同的突变量。话虽这么说,分布应该足够窄,以便,如果Mutrate = 2,大多数人会得到两个突变,有些突变,有些也许是三个。但是,我不希望一个人得到大量的突变,许多人没有突变,有些突变会将字母更改为同一字母,因此,对于较大的人口n,预期的平均突变数为稍有小于指定的弹药。
- 我相信答案与使用方形支架子集方法的能力有关,从矩阵垫的每一列获取一个随机元素。但是,我找不到有关如何使用语法将一个随机元素与矩阵的每一列隔离的任何信息。 MAT [示例(1:M),示例(1:n)]显然会给您整个矩阵...也许我在这里丢失了一些愚蠢的东西。
任何帮助将不胜感激!
I am running a simulation in R, which I am trying to make more efficient.
A little bit of background: this is an abstract simulation to test the effects of mutation on a population. The population has N individuals and each individuals has a genotype of M letters, each letter can be one of the twenty amino acids (I denote as 0:19).
One of the most (computationally) expensive tasks involves taking a matrix "mat" with M rows and N columns, which initially starts as a matrix of all zeroes,
mat <- matrix(rep(0,M*N),nrow=M)
And then changing (mutating) at least one letter in the genotype of each individual. The reason I say at least is, I would ideally like to set a mutation rate (mutrate) that, if I set to 2 in my overall simulation function, it will cause 2 mutations in the matrix per individual.
I found two rather computationally expensive ways to do so. As you can see below, only the second method incorporates the mutation rate parameter mutrate (I could not easily of think how to incorporate it into the first).
#method 1
for(i in 1:N){
position <- floor(runif(N, min=0, max=M))
letter <- floor(runif(N, min=0, max=19))
mat[position[i],i] = letter[i]}
#method 2, somewhat faster and incorporates mutation rate
mat <- apply(mat,2,function(x) (x+sample(c(rep(0,M-mutrate),sample(0:19,size=mutrate))%%20))))
The second method incorporates a modulus because genotype values have to be between 0 and 19 as I mentioned.
A few additional notes for clarity:
- I don't strictly need every individual to get exactly the same mutation amount. But that being said, the distribution should be narrow enough such that, if mutrate = 2, most individuals get two mutations, some one, some maybe three. I don't want however one individual getting a huge amount of mutations and many individuals getting no mutations Notably, some mutations will change the letter into the same letter, and so for a large population size N, the expected average number of mutations is slightly less than the assigned mutrate.
- I believe the answer has something to do with the ability to use the square-bracket subsetting method to obtain one random element from every column of the matrix mat. However, I could not find any information about how to use the syntax to isolate one random element from every column of a matrix. mat[sample(1:M),sample(1:N)] obviously gives you the whole matrix... perhaps I am missing something stupidly clear here.
Any help is greatly appreciated !
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
![扫码二维码加入Web技术交流群](/public/img/jiaqun_03.jpg)
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先回答您的最后一个问题;您可以使用
MAT [行,列]
或通过其顺序单元格ID访问矩阵中的单个单元格。单元1,1
是第一个单元格,其次是2,1
,3,1
等:访问/覆盖单个单元格很快但是也是如此。我能想到的执行您任务的最快方法是首先为我们想要的值创建向量。所有列索引的向量(每个列与
mutrate
>),行索引的向量(随机)和这些列/行组合的新值的向量(随机)。我们还可以计算细胞IDS,而不是更新矩阵的循环循环,以便我们可以一次更新所有矩阵单元格:
使用6000x10000矩阵尝试对多种方法进行基准测试,显示每种方法的速度:
I've modified your first method to speed it up and perform mutrate.
To answer your last question first; you can access a single cell in a matrix with
mat[row,column]
, or multiple scattered cells by their sequential cell id. Cell1,1
is the first cell, followed by2,1
,3,1
, etc:Accessing / overwriting the individual cells is fast too however. The fastest way that I could think of to perform your task, is to first create vectors for the values we want. A vector of all column indices (every column as many times as
mutrate
), a vector of row indices (randomly), and a vector of new values for these column/row combinations (randomly).Instead of that for-loop to update the matrix, we can also calculate the cell-IDs so we can update all matrix cells in one go:
Trying with a 6000x10000 matrix to benchmark the multiple methods, shows how fast each method is:
I've modified your first method to speed it up and perform mutrate.