R 中数据帧中的条目计数

发布于 2024-08-12 07:37:53 字数 344 浏览 2 评论 0原文

我正在寻找以下数据框的计数:

> Santa
   Believe Age Gender Presents Behaviour
1    FALSE   9   male       25   naughty
2     TRUE   5   male       20      nice
3     TRUE   4 female       30      nice
4     TRUE   4   male       34   naughty

相信的儿童数量。我会使用什么命令来获得这个?

(实际的数据框要大得多。我刚刚给了您前四行......)

谢谢!

I'm looking to get a count for the following data frame:

> Santa
   Believe Age Gender Presents Behaviour
1    FALSE   9   male       25   naughty
2     TRUE   5   male       20      nice
3     TRUE   4 female       30      nice
4     TRUE   4   male       34   naughty

of the number of children who believe. What command would I use to get this?

(The actual data frame is much bigger. I've just given you the first four rows...)

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

已下线请稍等 2024-08-19 07:37:53

您可以使用

R> x <- read.table(textConnection('
   Believe Age Gender Presents Behaviour
1    FALSE   9   male       25   naughty
2     TRUE   5   male       20      nice
3     TRUE   4 female       30      nice
4     TRUE   4   male       34   naughty'
), header=TRUE)

R> table(x$Believe)

FALSE  TRUE 
    1     3 

You could use table:

R> x <- read.table(textConnection('
   Believe Age Gender Presents Behaviour
1    FALSE   9   male       25   naughty
2     TRUE   5   male       20      nice
3     TRUE   4 female       30      nice
4     TRUE   4   male       34   naughty'
), header=TRUE)

R> table(x$Believe)

FALSE  TRUE 
    1     3 
送君千里 2024-08-19 07:37:53

我认为这是一个两步过程:

  1. 根据提供的过滤器对原始数据帧进行子集化
    (相信==假);然后

  2. 获取该子集的行数

对于第一步,subset函数是一个很好的方法这(只是普通索引或括号表示法的替代)。

对于第二步,我将使用 dimnrow

使用 subset 的一个优点>:您不必解析它返回的结果来获取您需要的结果 - 只需直接调用nrow即可。

所以在你的情况下:

v = nrow(subset(Santa, Believe==FALSE))     # 'subset' returns a data.frame

或者包装在匿名函数中:

>> fnx = function(fac, lev){nrow(subset(Santa, fac==lev))}

>> fnx(Believe, TRUE)
      3

除了nrow之外,dim也可以完成这项工作。此函数返回数据框的维度(行、列),因此您只需提供适当的索引即可访问行数:

v = dim(subset(Santa, Believe==FALSE))[1] 

在此之前发布的 OP 的答案显示了使用列联表。我不喜欢用这种方法来解决OP中提到的一般问题。原因如下。诚然,此数据框中有多少行在 C 列中具有值 x? 的一般问题可以使用列联表以及使用“过滤”方案来回答(如我在此处的回答) 。如果您想要给定因子变量(列)的所有值的行计数,那么列联表(通过调用table并传入感兴趣的列)是最明智的解决方案;但是,OP 要求对因子变量中的特定值进行计数,而不是对所有值进行计数。除了性能影响(可能很大,可能很小,仅取决于数据帧的大小和该函数所在的处理管道上下文)。当然,一旦返回表调用的结果,您仍然需要从该结果中解析出您想要的计数。

这就是为什么对我来说,这是一个过滤问题而不是交叉表问题。

I think of this as a two-step process:

  1. subset the original data frame according to the filter supplied
    (Believe==FALSE); then

  2. get the row count of this subset

For the first step, the subset function is a good way to do this (just an alternative to ordinary index or bracket notation).

For the second step, i would use dim or nrow

One advantage of using subset: you don't have to parse the result it returns to get the result you need--just call nrow on it directly.

so in your case:

v = nrow(subset(Santa, Believe==FALSE))     # 'subset' returns a data.frame

or wrapped in an anonymous function:

>> fnx = function(fac, lev){nrow(subset(Santa, fac==lev))}

>> fnx(Believe, TRUE)
      3

Aside from nrow, dim will also do the job. This function returns the dimensions of a data frame (rows, cols) so you just need to supply the appropriate index to access the number of rows:

v = dim(subset(Santa, Believe==FALSE))[1] 

An answer to the OP posted before this one shows the use of a contingency table. I don't like that approach for the general problem as recited in the OP. Here's the reason. Granted, the general problem of how many rows in this data frame have value x in column C? can be answered using a contingency table as well as using a "filtering" scheme (as in my answer here). If you want row counts for all values for a given factor variable (column) then a contingency table (via calling table and passing in the column(s) of interest) is the most sensible solution; however, the OP asks for the count of a particular value in a factor variable, not counts across all values. Aside from the performance hit (might be big, might be trivial, just depends on the size of the data frame and the processing pipeline context in which this function resides). And of course once the result from the call to table is returned, you still have to parse from that result just the count that you want.

So that's why, to me, this is a filtering rather than a cross-tab problem.

岛徒 2024-08-19 07:37:53
sum(Santa$Believe)
sum(Santa$Believe)
白龙吟 2024-08-19 07:37:53

您可以执行 summary(santa$Believe),您将获得 TRUEFALSE 的计数

You can do summary(santa$Believe) and you will get the count for TRUE and FALSE

穿越时光隧道 2024-08-19 07:37:53

DPLYR 让这一切变得非常简单。

x<-santa%>%
   count(Believe)

如果你想按组数;例如,有多少男性和女性相信,只需添加一个group_by

x<-santa%>%
   group_by(Gender)%>%
   count(Believe)

DPLYR makes this really easy.

x<-santa%>%
   count(Believe)

If you wanted to count by a group; for instance, how many males v females believe, just add a group_by:

x<-santa%>%
   group_by(Gender)%>%
   count(Believe)
萌逼全场 2024-08-19 07:37:53

带有 data.table单行解决方案可以是

library(data.table)
setDT(x)[,.N,by=Believe]
   Believe N
1:   FALSE 1
2:    TRUE 3

A one-line solution with data.table could be

library(data.table)
setDT(x)[,.N,by=Believe]
   Believe N
1:   FALSE 1
2:    TRUE 3
〆一缕阳光ご 2024-08-19 07:37:53

使用 sqldf 适合这里:

library(sqldf)
sqldf("SELECT Believe, Count(1) as N FROM Santa
       GROUP BY Believe")

using sqldf fits here:

library(sqldf)
sqldf("SELECT Believe, Count(1) as N FROM Santa
       GROUP BY Believe")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文