计算分组数据的平均RLE $长度

发布于 2025-01-23 20:16:49 字数 1662 浏览 0 评论 0 原文

我想在分组数据上使用 rle()计算状态持续时间。这是测试数据框架:

DF <- read.table(text="Time,x,y,sugar,state,ID
0,31,21,0.2,0,L0
1,31,21,0.65,0,L0
2,31,21,1.0,0,L0
3,31,21,1.5,1,L0
4,31,21,1.91,1,L0
5,31,21,2.3,1,L0
6,31,21,2.75,0,L0
7,31,21,3.14,0,L0
8,31,22,3.0,2,L0
9,31,22,3.47,1,L0
10,31,22,3.930,0,L0
0,37,1,0.2,0,L1
1,37,1,0.65,0,L1
2,37,1,1.089,0,L1
3,37,1,1.5198,0,L1
4,36,1,1.4197,2,L1
5,36,1,1.869,0,L1
6,36,1,2.3096,0,L1
7,36,1,2.738,0,L1
8,36,1,3.16,0,L1
9,36,1,3.5703,0,L1
10,36,1,3.970,0,L1
", header = TRUE, sep =",")

我想知道状态== 1的平均长度,由ID分组。我创建了一个启发的函数: 要计算RLE平均部分:

rle_mean_lengths = function(x, value) {
  r = rle(x)
  cond = r$values == value 
  data.frame(count = sum(cond), avg_length = mean(r$lengths[cond]))
}

然后我添加分组方面:

DF %>% group_by(ID) %>% do(rle_mean_lengths(DF$state,1))

但是,生成的值不正确:

ID 计数 AVG_LENGTH
1 L0 2 2 2
2 L1 2 2 2

L0正确,L1没有状态== 1的实例因此平均值应为零或NA。 我将问题分解为简单的总结而隔离了问题:

DF %>% group_by(ID) %>% summarize_at(vars(state),list(name=mean)) # This works but if I use summarize it gives me weird values again.

如何为do()做等效的summarize_at()?还是还有另一个修复程序?谢谢

I would like to calculate duration of state using rle() on grouped data. Here is test data frame:

DF <- read.table(text="Time,x,y,sugar,state,ID
0,31,21,0.2,0,L0
1,31,21,0.65,0,L0
2,31,21,1.0,0,L0
3,31,21,1.5,1,L0
4,31,21,1.91,1,L0
5,31,21,2.3,1,L0
6,31,21,2.75,0,L0
7,31,21,3.14,0,L0
8,31,22,3.0,2,L0
9,31,22,3.47,1,L0
10,31,22,3.930,0,L0
0,37,1,0.2,0,L1
1,37,1,0.65,0,L1
2,37,1,1.089,0,L1
3,37,1,1.5198,0,L1
4,36,1,1.4197,2,L1
5,36,1,1.869,0,L1
6,36,1,2.3096,0,L1
7,36,1,2.738,0,L1
8,36,1,3.16,0,L1
9,36,1,3.5703,0,L1
10,36,1,3.970,0,L1
", header = TRUE, sep =",")

I want to know the average length for state == 1, grouped by ID. I have created a function inspired by: https://www.reddit.com/r/rstats/comments/brpzo9/tidyverse_groupby_and_rle/
to calculate the rle average portion:

rle_mean_lengths = function(x, value) {
  r = rle(x)
  cond = r$values == value 
  data.frame(count = sum(cond), avg_length = mean(r$lengths[cond]))
}

And then I add in the grouping aspect:

DF %>% group_by(ID) %>% do(rle_mean_lengths(DF$state,1))

However, the values that are generated are incorrect:

ID count avg_length
1 L0 2 2
2 L1 2 2

L0 is correct, L1 has no instances of state == 1 so the average should be zero or NA.
I isolated the problem in terms of breaking it down into just summarize:

DF %>% group_by(ID) %>% summarize_at(vars(state),list(name=mean)) # This works but if I use summarize it gives me weird values again.

How do I do the equivalent summarize_at() for do()? Or is there another fix? Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

小伙你站住 2025-01-30 20:16:49

由于它是一个data.frame列,因此我们可能需要 unnest 之后

library(dplyr)
library(tidyr)
DF %>% 
 group_by(ID) %>%
  summarise(new = list(rle_mean_lengths(state, 1)), .groups = "drop") %>%
  unnest(new)

或删除 list

 DF %>% 
  group_by(ID) %>%
  summarise(new = rle_mean_lengths(state, 1), .groups = "drop") %>% 
  unpack(new)
# A tibble: 2 × 3
  ID    count avg_length
  <chr> <int>      <dbl>
1 L0        2          2
2 L1        0        NaN

在OP的 do do do unwack do /code>代码,应提取的列不应来自整个数据,而应从来自LHS IE 的数据中弃用。

DF %>% 
  group_by(ID) %>%
  do(rle_mean_lengths(.$state,1))
# A tibble: 2 × 3
# Groups:   ID [2]
  ID    count avg_length
  <chr> <int>      <dbl>
1 L0        2          2
2 L1        0        NaN

As it is a data.frame column, we may need to unnest afterwards

library(dplyr)
library(tidyr)
DF %>% 
 group_by(ID) %>%
  summarise(new = list(rle_mean_lengths(state, 1)), .groups = "drop") %>%
  unnest(new)

Or remove the list and unpack

 DF %>% 
  group_by(ID) %>%
  summarise(new = rle_mean_lengths(state, 1), .groups = "drop") %>% 
  unpack(new)
# A tibble: 2 × 3
  ID    count avg_length
  <chr> <int>      <dbl>
1 L0        2          2
2 L1        0        NaN

In the OP's do code, the column that should be extracted should be not from the whole data, but from the data coming fromt the lhs i.e. . (Note that do is kind of deprecated. So it may be better to make use of the summarise with unnest/unpack

DF %>% 
  group_by(ID) %>%
  do(rle_mean_lengths(.$state,1))
# A tibble: 2 × 3
# Groups:   ID [2]
  ID    count avg_length
  <chr> <int>      <dbl>
1 L0        2          2
2 L1        0        NaN
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文