R 中从计数到案例

发布于 2025-01-16 19:42:42 字数 1101 浏览 0 评论 0原文

我有一个数据集,其中有一列指示由多个变量组成的组的出现次数。这里是性别颜色

CASES <- base::data.frame(SEX   = c("M", "M", "F", "F", "F"), 
                          COLOR = c("brown", "blue", "brown", "brown", "brown"))
COUNT <- base::as.data.frame(base::table(CASES))
COUNT

我需要更改数据集的结构,因此该组的每次出现都有一行。有人帮助我创建了一个完美运行的函数。

countsToCases <- function(x, countcol = "Freq") {
    # Get the row indices to pull from x
    idx <- rep.int(seq_len(nrow(x)), x[[countcol]])
    # Drop count column
    x[[countcol]] <- NULL
    # Get the rows from x
    x[idx, ]
}

CASES <- countsToCases(base::as.data.frame(COUNT))
CASES

现在的问题是我有一个巨大的数据集(来自 tidytuesday 的 babyname 数据集),但它不起作用,因为它太慢了。

db_babynames <- data.table::as.data.table(tuesdata$babyname)

db_babynames <- db_babynames[
  j = characters_n := stringr::str_count(string  = name,
                                         pattern = ".")
][
  j = c("year", "characters_n", "n")
]

我正在寻找更快的解决方案,如果可能的话,使用 data.table 包。

I have a dataset with a column which indicate the number of occurence of a group constituted by multiples variables. Here SEXand COLOR.

CASES <- base::data.frame(SEX   = c("M", "M", "F", "F", "F"), 
                          COLOR = c("brown", "blue", "brown", "brown", "brown"))
COUNT <- base::as.data.frame(base::table(CASES))
COUNT

I need to change the structure of the dataset, so I have one row for each occurence of the group. Someone helped me to create a function which works perfectly.

countsToCases <- function(x, countcol = "Freq") {
    # Get the row indices to pull from x
    idx <- rep.int(seq_len(nrow(x)), x[[countcol]])
    # Drop count column
    x[[countcol]] <- NULL
    # Get the rows from x
    x[idx, ]
}

CASES <- countsToCases(base::as.data.frame(COUNT))
CASES

The problem is now that I have a HUGE dataset (the babyname dataset from tidytuesday), and this is not working since it's too slow.

db_babynames <- data.table::as.data.table(tuesdata$babyname)

db_babynames <- db_babynames[
  j = characters_n := stringr::str_count(string  = name,
                                         pattern = ".")
][
  j = c("year", "characters_n", "n")
]

I'm looking for a faster solution, working with the data.table package if possible.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

蓝咒 2025-01-23 19:42:42

如果需要未计数的版本,我会使用 tidyr::uncount(),但请考虑此 帖子中的建议 使用您的原始数据

library(dplyr)
library(tidyr)

CASES <- base::data.frame(
  SEX   = c("M", "M", "F", "F", "F"),
  COLOR = c("brown", "blue", "brown", "brown", "brown")
  )

COUNT <- count(CASES, SEX, COLOR, name = 'Freq')

tidyr::uncount(base::as.data.frame(COUNT), Freq)
#>   SEX COLOR
#> 1   F brown
#> 2   F brown
#> 3   F brown
#> 4   M  blue
#> 5   M brown

reprex 包 (v2.0.1)

If an uncounted version is needed I would use tidyr::uncount(), but consider the recommendation in this post to work with your original data

library(dplyr)
library(tidyr)

CASES <- base::data.frame(
  SEX   = c("M", "M", "F", "F", "F"),
  COLOR = c("brown", "blue", "brown", "brown", "brown")
  )

COUNT <- count(CASES, SEX, COLOR, name = 'Freq')

tidyr::uncount(base::as.data.frame(COUNT), Freq)
#>   SEX COLOR
#> 1   F brown
#> 2   F brown
#> 3   F brown
#> 4   M  blue
#> 5   M brown

Created on 2022-03-25 by the reprex package (v2.0.1)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文