列中非零或特定数字的频率
我的输入文件:
x <- read.table(textConnection('
t0 t1 t2 t3 t4
aa 0 1 0 1 0
bb 1 0 1 0 1
cc 0 0 0 0 0
dd 1 1 1 0 1
ee 1 1 1 0 0
ff 0 0 1 0 1
gg -1 -1 -1 -1 0
hh -1 1 -1 1 -1
'), header=TRUE)
我想首先计算每列的频率,即
t0 t1 t2 t3 t4
freqency 5/8 5/8 6/8 3/8 4/8
然后将频率乘回到矩阵x,以获得新矩阵,如下:
t0 t1 t2 t3 t4
aa 0 5/8 0 3/8 0
bb 5/8 0 6/8 0 4/8
cc 0 0 0 0 0
dd 5/8 5/8 6/8 0 4/8
ee 5/8 5/8 6/8 0 0
ff 0 0 6/8 0 4/8
gg -5/8 -5/8 -6/8 -3/8 0
hh -5/8 5/8 -6/8 3/8 -4/8
如何用R来做?我从手册中了解到 prop.table(x) 可用于获取整个表的总体概率,我如何单独为每一列执行此操作?请帮忙。
My input file:
x <- read.table(textConnection('
t0 t1 t2 t3 t4
aa 0 1 0 1 0
bb 1 0 1 0 1
cc 0 0 0 0 0
dd 1 1 1 0 1
ee 1 1 1 0 0
ff 0 0 1 0 1
gg -1 -1 -1 -1 0
hh -1 1 -1 1 -1
'), header=TRUE)
I want to firstly calculate the frequency of each columns, i.e.
t0 t1 t2 t3 t4
freqency 5/8 5/8 6/8 3/8 4/8
And then multiply the frequency back to matrix x, to obtain the new matrix as follows:
t0 t1 t2 t3 t4
aa 0 5/8 0 3/8 0
bb 5/8 0 6/8 0 4/8
cc 0 0 0 0 0
dd 5/8 5/8 6/8 0 4/8
ee 5/8 5/8 6/8 0 0
ff 0 0 6/8 0 4/8
gg -5/8 -5/8 -6/8 -3/8 0
hh -5/8 5/8 -6/8 3/8 -4/8
How to do it with R? I learnt from manuals that prop.table(x) could be used to get the overall probability for the whole table, how can I do it for each column individually? Pls kindly help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
本着与 @Joris 的答案相同的精神,这就是精彩的
sweep()
函数发挥作用的地方:这里发生的是
colMeans(abs(x))
code> 是长度为 5 的向量。我们按列(由调用中的MARGIN = 2
指示)sweep()
对数据x
在我们进行过程中应用函数*
。因此,t0
列中的值全部乘以colMeans(abs(x))[1]
,t1
列中的值全部乘以乘以colMeans(abs(x))[2]
等等。sweep()
的优点是,当给定一个矩阵时,它非常快:当给定一个数据帧时,它要慢得多:
但这就是事情的样子R。
In the same spirit as the answer from @Joris, this is where the wonderful
sweep()
function comes into it's own:What is happening here is that
colMeans(abs(x))
is a vector of length 5. Wesweep()
these values, column-wise (indicated by theMARGIN = 2
in the call), over the datax
applying the function*
as we go. So, the values in columnt0
all get multiplied bycolMeans(abs(x))[1]
, the values in columnt1
all get multiplied bycolMeans(abs(x))[2]
and so on.The advantage of
sweep()
is that it is very fast when given a matrix:It is much slower when given a data frame:
But that is just the way things are in R.
试试这个:
获取频率并
获取数据帧。
mapply
将函数*
应用于每一列,并采用提到的参数。另请参阅?mapply
Try this :
for the frequencies and
to get the dataframe.
mapply
applies the function*
on every column, taking the arguments mentioned. See also?mapply