用 for 循环填充矩阵
我正在使用遗传学包中的一个名为 LD()
的函数。为了简化它的工作,它本质上需要一个基因型列表(A/A、A/C、G/A 等)并创建一个值列表(D、D'、r 等)。它看起来像这样:
a=LD(genotype1,genotype2)
结果看起来像这样:
Pairwise LD
-----------
D D' Corr
Estimates: 0.1419402 0.8110866 0.6029553
X^2 P-value N
LD Test: 10.90665 0.0009581958 15
我只需要来自 Corr 的值,所以我会使用 a$r
调用它。
我有 2 个数据框,我想在其笛卡尔积上使用该函数:
df1
和 df2
是 2 个数据框,每列 (col) 代表基因型列表。 我正在考虑使用 for 循环来填写矩阵:
df1=data.frame(c("A/A","C/C","A/A"),c("G/G","T/T","T/T"))
df2=data.frame(c("A/T","C/T","C/C"),c("A/A","A/T","G/G"))
q=1 # acts as a counter
n=length(df1$col1) # All lists are the same length
k=length(df2$col2) # These are to set the dimensions of the matrix
r=n*k
m=matrix(data=NA, nrow=r, ncol=3, byrow=TRUE, dimnames=list(NULL, c("c14","c19","Link")))
for(i in (1:n))
{
for(j in (1:k))
{
geno1=genotype(df2)[j] #genotype is a function that must be applied to the
geno2=genotype(df1)[i] #lists before the LD() function can be used
d=LD(geno1,geno2)
m=d$r #I only need the values from this section of the output
ld[q,]=c(names(df1),names(df2),m) #This is supposed to fill out the matrix
#I'm also not sure of how to do that part
q=q+1 #this is so that the above line fills in the next row with each iteration
}
}
当我运行这个时,我收到一个错误:
Error in dim(a1) <- a1.d :
dims [product "some number"] do not match the length of object ["another number"]
我期望一个 3 列和许多行矩阵,第一列是第一个基因型的名称(列名称df1 的名称),第二列是第二个基因型的名称(df2 的列名称),第三列是从 LD() 函数获得的值
有什么建议吗?谢谢!
更新答案: 我设法得到它:
q=1 # acts as a counter
n=length(t1$rs.)
k=length(t2$rs.)
r=n*k
ld=matrix(data=NA, nrow=r, ncol=3, byrow=TRUE, dimnames=list(NULL, c("c14","c19","Link")))
for(i in (1:n))
{
for(j in (1:k))
{
deq=LD(genotype(g1[,i]),genotype(g2[,j]))
m=deq$r
ld[q,]=c(i,j,m)
q=q+1
}
}
I'm using a function from the genetics package called LD()
. To simplify what it does, it essentially takes a list of genotypes (A/A, A/C, G/A, etc.) and creates a list of values (D, D', r, etc.). It looks something like this:
a=LD(genotype1,genotype2)
with the results looking like:
Pairwise LD
-----------
D D' Corr
Estimates: 0.1419402 0.8110866 0.6029553
X^2 P-value N
LD Test: 10.90665 0.0009581958 15
I only need values from Corr, so I'd call upon it with a$r
.
I have 2 dataframes and I want to use that function on their cartesian product:
df1
and df2
are the 2 dataframes, with each column (col) represents a list of genotypes.
I'm thinking of using a for loop to fill out a matrix:
df1=data.frame(c("A/A","C/C","A/A"),c("G/G","T/T","T/T"))
df2=data.frame(c("A/T","C/T","C/C"),c("A/A","A/T","G/G"))
q=1 # acts as a counter
n=length(df1$col1) # All lists are the same length
k=length(df2$col2) # These are to set the dimensions of the matrix
r=n*k
m=matrix(data=NA, nrow=r, ncol=3, byrow=TRUE, dimnames=list(NULL, c("c14","c19","Link")))
for(i in (1:n))
{
for(j in (1:k))
{
geno1=genotype(df2)[j] #genotype is a function that must be applied to the
geno2=genotype(df1)[i] #lists before the LD() function can be used
d=LD(geno1,geno2)
m=d$r #I only need the values from this section of the output
ld[q,]=c(names(df1),names(df2),m) #This is supposed to fill out the matrix
#I'm also not sure of how to do that part
q=q+1 #this is so that the above line fills in the next row with each iteration
}
}
When I run this, I get an error:
Error in dim(a1) <- a1.d :
dims [product "some number"] do not match the length of object ["another number"]
I'm expecting a 3 column and many rowed matrix with the first column being the name of the first genotype(column names of df1), the second column being the name of the second genotype (column names of df2), and the third column with the values obtained from the LD()
function
Any advice? Thanks!
UPDATE ANSWER:
I managed to get it:
q=1 # acts as a counter
n=length(t1$rs.)
k=length(t2$rs.)
r=n*k
ld=matrix(data=NA, nrow=r, ncol=3, byrow=TRUE, dimnames=list(NULL, c("c14","c19","Link")))
for(i in (1:n))
{
for(j in (1:k))
{
deq=LD(genotype(g1[,i]),genotype(g2[,j]))
m=deq$r
ld[q,]=c(i,j,m)
q=q+1
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我很难理解你作品的第一部分。为什么要使用两个 data.frame?我通常为 data.frame 提供每个个体一行和每个标记一行的数据,LD 计算所有可能的成对比较。
然而,假设您使用 LD 包估计 LD(是的,他们说它已经过时,但它仍然是最好的!)
您可以按如下方式进行:
I have difficulties in understanding the first part of your work. Why do you want to use two data.frames? I usually feed a data.frame with one line per individual and one row per marker, and LD calculates all the possible pairwise comparisons.
However, let's assume you estimated LD using the LD package (yes, they say it's obsolete, but it's still the best!)
You can proceed as follows: