用 for 循环填充矩阵

发布于 2024-11-18 22:19:50 字数 2084 浏览 3 评论 0原文

我正在使用遗传学包中的一个名为 LD() 的函数。为了简化它的工作，它本质上需要一个基因型列表（A/A、A/C、G/A 等）并创建一个值列表（D、D'、r 等）。它看起来像这样：

a=LD(genotype1,genotype2)

结果看起来像这样：

Pairwise LD
-----------
                   D        D'      Corr
Estimates: 0.1419402 0.8110866 0.6029553

              X^2      P-value  N
LD Test: 10.90665 0.0009581958 15

我只需要来自 Corr 的值，所以我会使用 a$r 调用它。

我有 2 个数据框，我想在其笛卡尔积上使用该函数：

df1 和 df2 是 2 个数据框，每列 (col) 代表基因型列表。我正在考虑使用 for 循环来填写矩阵：

df1=data.frame(c("A/A","C/C","A/A"),c("G/G","T/T","T/T"))
df2=data.frame(c("A/T","C/T","C/C"),c("A/A","A/T","G/G"))
q=1 # acts as a counter
n=length(df1$col1) # All lists are the same length
k=length(df2$col2) # These are to set the dimensions of the matrix
r=n*k

m=matrix(data=NA, nrow=r, ncol=3, byrow=TRUE, dimnames=list(NULL, c("c14","c19","Link")))

for(i in (1:n))
{
  for(j in (1:k))
  {
    geno1=genotype(df2)[j] #genotype is a function that must be applied to the
    geno2=genotype(df1)[i] #lists before the LD() function can be used
    d=LD(geno1,geno2)

    m=d$r #I only need the values from this section of the output

    ld[q,]=c(names(df1),names(df2),m) #This is supposed to fill out the matrix
                                      #I'm also not sure of how to do that part
    q=q+1 #this is so that the above line fills in the next row with each iteration
  }
}

当我运行这个时，我收到一个错误：

Error in dim(a1) <- a1.d : 
dims [product "some number"] do not match the length of object ["another number"]

我期望一个 3 列和许多行矩阵，第一列是第一个基因型的名称（列名称df1 的名称），第二列是第二个基因型的名称（df2 的列名称），第三列是从 LD() 函数获得的值

有什么建议吗？谢谢！

更新答案：我设法得到它：

q=1 # acts as a counter
n=length(t1$rs.)
k=length(t2$rs.)
r=n*k

ld=matrix(data=NA, nrow=r, ncol=3, byrow=TRUE, dimnames=list(NULL, c("c14","c19","Link")))

for(i in (1:n))
{
  for(j in (1:k))
  {
    deq=LD(genotype(g1[,i]),genotype(g2[,j]))
    m=deq$r
    ld[q,]=c(i,j,m)
    q=q+1
  }
}

原文

I'm using a function from the genetics package called LD(). To simplify what it does, it essentially takes a list of genotypes (A/A, A/C, G/A, etc.) and creates a list of values (D, D', r, etc.). It looks something like this:

a=LD(genotype1,genotype2)

with the results looking like:

Pairwise LD
-----------
                   D        D'      Corr
Estimates: 0.1419402 0.8110866 0.6029553

              X^2      P-value  N
LD Test: 10.90665 0.0009581958 15

I only need values from Corr, so I'd call upon it with a$r.

I have 2 dataframes and I want to use that function on their cartesian product:

df1 and df2 are the 2 dataframes, with each column (col) represents a list of genotypes.
I'm thinking of using a for loop to fill out a matrix:

df1=data.frame(c("A/A","C/C","A/A"),c("G/G","T/T","T/T"))
df2=data.frame(c("A/T","C/T","C/C"),c("A/A","A/T","G/G"))
q=1 # acts as a counter
n=length(df1$col1) # All lists are the same length
k=length(df2$col2) # These are to set the dimensions of the matrix
r=n*k

m=matrix(data=NA, nrow=r, ncol=3, byrow=TRUE, dimnames=list(NULL, c("c14","c19","Link")))

for(i in (1:n))
{
  for(j in (1:k))
  {
    geno1=genotype(df2)[j] #genotype is a function that must be applied to the
    geno2=genotype(df1)[i] #lists before the LD() function can be used
    d=LD(geno1,geno2)

    m=d$r #I only need the values from this section of the output

    ld[q,]=c(names(df1),names(df2),m) #This is supposed to fill out the matrix
                                      #I'm also not sure of how to do that part
    q=q+1 #this is so that the above line fills in the next row with each iteration
  }
}

When I run this, I get an error:

Error in dim(a1) <- a1.d : 
dims [product "some number"] do not match the length of object ["another number"]

I'm expecting a 3 column and many rowed matrix with the first column being the name of the first genotype(column names of df1), the second column being the name of the second genotype (column names of df2), and the third column with the values obtained from the LD() function

Any advice? Thanks!

UPDATE ANSWER:
I managed to get it:

q=1 # acts as a counter
n=length(t1$rs.)
k=length(t2$rs.)
r=n*k

ld=matrix(data=NA, nrow=r, ncol=3, byrow=TRUE, dimnames=list(NULL, c("c14","c19","Link")))

for(i in (1:n))
{
  for(j in (1:k))
  {
    deq=LD(genotype(g1[,i]),genotype(g2[,j]))
    m=deq$r
    ld[q,]=c(i,j,m)
    q=q+1
  }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

拿命拼未来 2024-11-25 22:19:50

我很难理解你作品的第一部分。为什么要使用两个 data.frame？我通常为 data.frame 提供每个个体一行和每个标记一行的数据，LD 计算所有可能的成对比较。
然而，假设您使用 LD 包估计 LD（是的，他们说它已经过时，但它仍然是最好的！）
您可以按如下方式进行：

#extract the correlation r from LD results
tc<-LD.object$"r"
#build a three columns matrix with all the pairwise combination of two markers
pwm<-combn(row.names(tc),2)
pwld<-matrix(NA,nrow=ncol(pwm),ncol=3)
pwld[,1:2]<-pwm[1:2,]
#Fill the matrix
for(aaa in 1:nrow(pwld))
{
pwld[aaa,3]<-tc[pwld[aaa,1],pwld[aaa,2]]
}

I have difficulties in understanding the first part of your work. Why do you want to use two data.frames? I usually feed a data.frame with one line per individual and one row per marker, and LD calculates all the possible pairwise comparisons.
However, let's assume you estimated LD using the LD package (yes, they say it's obsolete, but it's still the best!)
You can proceed as follows:

#extract the correlation r from LD results
tc<-LD.object$"r"
#build a three columns matrix with all the pairwise combination of two markers
pwm<-combn(row.names(tc),2)
pwld<-matrix(NA,nrow=ncol(pwm),ncol=3)
pwld[,1:2]<-pwm[1:2,]
#Fill the matrix
for(aaa in 1:nrow(pwld))
{
pwld[aaa,3]<-tc[pwld[aaa,1],pwld[aaa,2]]
}

回复收藏 0 原文

~没有更多了~