如何获得列联表?
我正在尝试根据特定类型的数据创建列联表。这可以通过循环等实现...但是因为我的最终表将包含超过 10E5 个单元格,所以我正在寻找一个预先存在的函数。
我的初始数据如下:
PLANT ANIMAL INTERACTIONS
---------------------- ------------------------------- ------------
Tragopogon_pratensis Propylea_quatuordecimpunctata 1
Anthriscus_sylvestris Rhagonycha_nigriventris 3
Anthriscus_sylvestris Sarcophaga_carnaria 2
Heracleum_sphondylium Sarcophaga_carnaria 1
Anthriscus_sylvestris Sarcophaga_variegata 4
Anthriscus_sylvestris Sphaerophoria_interrupta_Gruppe 3
Cerastium_holosteoides Sphaerophoria_interrupta_Gruppe 1
我想创建一个这样的表格:
Propylea_quatuordecimpunctata Rhagonycha_nigriventris Sarcophaga_carnaria Sarcophaga_variegata Sphaerophoria_interrupta_Gruppe
---------------------- ----------------------------- ----------------------- ------------------- -------------------- -------------------------------
Tragopogon_pratensis 1 0 0 0 0
Anthriscus_sylvestris 0 3 2 4 3
Heracleum_sphondylium 0 0 1 0 0
Cerastium_holosteoides 0 0 0 0 1
即所有植物物种在行中,所有动物物种在列中,有时没有相互作用(而我的初始数据仅列出发生的相互作用)。
I am trying to create a contingency table from a particular type of data. This would be doable with loops etc... but because my final table would contain more than 10E5 cells, I am looking for a pre-existing function.
My initial data are as follow:
PLANT ANIMAL INTERACTIONS
---------------------- ------------------------------- ------------
Tragopogon_pratensis Propylea_quatuordecimpunctata 1
Anthriscus_sylvestris Rhagonycha_nigriventris 3
Anthriscus_sylvestris Sarcophaga_carnaria 2
Heracleum_sphondylium Sarcophaga_carnaria 1
Anthriscus_sylvestris Sarcophaga_variegata 4
Anthriscus_sylvestris Sphaerophoria_interrupta_Gruppe 3
Cerastium_holosteoides Sphaerophoria_interrupta_Gruppe 1
I would like to create a table like this:
Propylea_quatuordecimpunctata Rhagonycha_nigriventris Sarcophaga_carnaria Sarcophaga_variegata Sphaerophoria_interrupta_Gruppe
---------------------- ----------------------------- ----------------------- ------------------- -------------------- -------------------------------
Tragopogon_pratensis 1 0 0 0 0
Anthriscus_sylvestris 0 3 2 4 3
Heracleum_sphondylium 0 0 1 0 0
Cerastium_holosteoides 0 0 0 0 1
That is, all plant species in row, all animal species in columns, and sometimes there are no interactions (while my initial data only list interactions that occur).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
在基础 R 中,使用
table
或xtabs
:gmodels
包有一个函数CrossTable
,它提供的输出类似于SPSS 或 SAS 用户期望:In base R, use
table
orxtabs
:The
gmodels
packages has a functionCrossTable
that gives output similar to what users of SPSS or SAS expects:reshape
包应该可以解决问题。我仍在研究如何解决
订单
问题,有什么建议吗?the
reshape
package should do the trick.I'm still figuring out how to fix the
order
issue, any suggestion?我想指出的是,我们可以在不使用函数
with
的情况下获得与 Andrie 发布的相同结果:R 基础
包包 gmodels:
I'd like to point out that we can get the same results Andrie posted without using the function
with
:R Base Package
Package gmodels:
基础 R 中的 xtabs 应该可以工作,例如:
我认为这应该相当容易地完成您正在寻找的任务。我不确定它如何在效率方面扩展到 10E5 列联表,但这可能是一个单独的统计问题。
xtabs in base R should work, for example:
I think that should do what you're looking for fairly easily. I'm not sure how it scales up in terms of efficiency to a 10E5 contingency table, but that might be a separate issue statistically.
使用 dplyr / tidyr :
With
dplyr / tidyr
:只需使用“
reshape2
”包中的dcast()
函数即可:这里“PLANT”将位于左列,“ANIMALS”将位于顶行,表格的填充将使用“INTERACTIONS”发生,“NULL”值将使用 0 填充。
Simply use
dcast()
function of "reshape2
" package:Here "PLANT" will be on the left column, "ANIMALS" on the top row, filling of the table will happen using "INTERACTIONS" and "NULL" values will be filled using 0's.