列中非零或特定数字的频率

发布于 2024-11-04 10:22:14 字数 890 浏览 3 评论 0原文

我的输入文件:

 x <- read.table(textConnection('
      t0  t1  t2  t3  t4
  aa  0   1   0   1   0
  bb  1   0   1   0   1
  cc  0   0   0   0   0
  dd  1   1   1   0   1
  ee  1   1   1   0   0
  ff  0   0   1   0   1
  gg  -1  -1  -1  -1  0
  hh  -1  1   -1  1   -1
 '), header=TRUE)

我想首先计算每列的频率,即

          t0   t1   t2   t3   t4
freqency  5/8  5/8  6/8  3/8  4/8

然后将频率乘回到矩阵x,以获得新矩阵,如下:

       t0    t1     t2     t3     t4
  aa   0     5/8    0      3/8    0
  bb   5/8   0      6/8    0      4/8
  cc   0     0      0      0      0
  dd   5/8   5/8    6/8    0      4/8
  ee   5/8   5/8    6/8    0      0
  ff   0     0      6/8    0      4/8
  gg  -5/8  -5/8   -6/8   -3/8    0
  hh  -5/8   5/8   -6/8    3/8   -4/8

如何用R来做?我从手册中了解到 prop.table(x) 可用于获取整个表的总体概率,我如何单独为每一列执行此操作?请帮忙。

My input file:

 x <- read.table(textConnection('
      t0  t1  t2  t3  t4
  aa  0   1   0   1   0
  bb  1   0   1   0   1
  cc  0   0   0   0   0
  dd  1   1   1   0   1
  ee  1   1   1   0   0
  ff  0   0   1   0   1
  gg  -1  -1  -1  -1  0
  hh  -1  1   -1  1   -1
 '), header=TRUE)

I want to firstly calculate the frequency of each columns, i.e.

          t0   t1   t2   t3   t4
freqency  5/8  5/8  6/8  3/8  4/8

And then multiply the frequency back to matrix x, to obtain the new matrix as follows:

       t0    t1     t2     t3     t4
  aa   0     5/8    0      3/8    0
  bb   5/8   0      6/8    0      4/8
  cc   0     0      0      0      0
  dd   5/8   5/8    6/8    0      4/8
  ee   5/8   5/8    6/8    0      0
  ff   0     0      6/8    0      4/8
  gg  -5/8  -5/8   -6/8   -3/8    0
  hh  -5/8   5/8   -6/8    3/8   -4/8

How to do it with R? I learnt from manuals that prop.table(x) could be used to get the overall probability for the whole table, how can I do it for each column individually? Pls kindly help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

随风而去 2024-11-11 10:22:14

本着与 @Joris 的答案相同的精神,这就是精彩的 sweep() 函数发挥作用的地方:

> sweep(x, MARGIN = 2, colMeans(abs(x)), "*")
       t0     t1    t2     t3   t4
aa  0.000  0.625  0.00  0.375  0.0
bb  0.625  0.000  0.75  0.000  0.5
cc  0.000  0.000  0.00  0.000  0.0
dd  0.625  0.625  0.75  0.000  0.5
ee  0.625  0.625  0.75  0.000  0.0
ff  0.000  0.000  0.75  0.000  0.5
gg -0.625 -0.625 -0.75 -0.375  0.0
hh -0.625  0.625 -0.75  0.375 -0.5

这里发生的是 colMeans(abs(x)) code> 是长度为 5 的向量。我们按列(由调用中的 MARGIN = 2 指示)sweep() 对数据 x 在我们进行过程中应用函数*。因此,t0 列中的值全部乘以 colMeans(abs(x))[1]t1 列中的值全部乘以乘以 colMeans(abs(x))[2] 等等。

sweep() 的优点是,当给定一个矩阵时,它非常快:

X <- data.matrix(x)
> system.time(replicate(1000, sweep(X, 2, means, "*")))
   user  system elapsed 
  0.115   0.000   0.118 
> system.time(replicate(1000, mapply(`*`, x, means)))
   user  system elapsed 
  0.308   0.001   0.309 
> system.time(replicate(1000, mapply(`*`, X, means)))
   user  system elapsed 
  0.204   0.000   0.205

当给定一个数据帧时,它要慢得多:

> system.time(replicate(1000, sweep(x, 2, means, "*")))
   user  system elapsed 
  2.072   0.000   2.074

但这就是事情的样子R。

In the same spirit as the answer from @Joris, this is where the wonderful sweep() function comes into it's own:

> sweep(x, MARGIN = 2, colMeans(abs(x)), "*")
       t0     t1    t2     t3   t4
aa  0.000  0.625  0.00  0.375  0.0
bb  0.625  0.000  0.75  0.000  0.5
cc  0.000  0.000  0.00  0.000  0.0
dd  0.625  0.625  0.75  0.000  0.5
ee  0.625  0.625  0.75  0.000  0.0
ff  0.000  0.000  0.75  0.000  0.5
gg -0.625 -0.625 -0.75 -0.375  0.0
hh -0.625  0.625 -0.75  0.375 -0.5

What is happening here is that colMeans(abs(x)) is a vector of length 5. We sweep() these values, column-wise (indicated by the MARGIN = 2 in the call), over the data x applying the function * as we go. So, the values in column t0 all get multiplied by colMeans(abs(x))[1], the values in column t1 all get multiplied by colMeans(abs(x))[2] and so on.

The advantage of sweep() is that it is very fast when given a matrix:

X <- data.matrix(x)
> system.time(replicate(1000, sweep(X, 2, means, "*")))
   user  system elapsed 
  0.115   0.000   0.118 
> system.time(replicate(1000, mapply(`*`, x, means)))
   user  system elapsed 
  0.308   0.001   0.309 
> system.time(replicate(1000, mapply(`*`, X, means)))
   user  system elapsed 
  0.204   0.000   0.205

It is much slower when given a data frame:

> system.time(replicate(1000, sweep(x, 2, means, "*")))
   user  system elapsed 
  2.072   0.000   2.074

But that is just the way things are in R.

橘亓 2024-11-11 10:22:14

试试这个:

> colMeans(abs(x))
   t0    t1    t2    t3    t4 
0.625 0.625 0.750 0.375 0.500 

获取频率并

> mapply(`*`,x,colMeans(abs(x)))
         t0     t1    t2     t3   t4
[1,]  0.000  0.625  0.00  0.375  0.0
[2,]  0.625  0.000  0.75  0.000  0.5
[3,]  0.000  0.000  0.00  0.000  0.0
[4,]  0.625  0.625  0.75  0.000  0.5
[5,]  0.625  0.625  0.75  0.000  0.0
[6,]  0.000  0.000  0.75  0.000  0.5
[7,] -0.625 -0.625 -0.75 -0.375  0.0
[8,] -0.625  0.625 -0.75  0.375 -0.5

获取数据帧。 mapply 将函数 * 应用于每一列,并采用提到的参数。另请参阅?mapply

Try this :

> colMeans(abs(x))
   t0    t1    t2    t3    t4 
0.625 0.625 0.750 0.375 0.500 

for the frequencies and

> mapply(`*`,x,colMeans(abs(x)))
         t0     t1    t2     t3   t4
[1,]  0.000  0.625  0.00  0.375  0.0
[2,]  0.625  0.000  0.75  0.000  0.5
[3,]  0.000  0.000  0.00  0.000  0.0
[4,]  0.625  0.625  0.75  0.000  0.5
[5,]  0.625  0.625  0.75  0.000  0.0
[6,]  0.000  0.000  0.75  0.000  0.5
[7,] -0.625 -0.625 -0.75 -0.375  0.0
[8,] -0.625  0.625 -0.75  0.375 -0.5

to get the dataframe. mapply applies the function * on every column, taking the arguments mentioned. See also ?mapply

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文