我想在分组数据上使用 rle()
计算状态持续时间。这是测试数据框架:
DF <- read.table(text="Time,x,y,sugar,state,ID
0,31,21,0.2,0,L0
1,31,21,0.65,0,L0
2,31,21,1.0,0,L0
3,31,21,1.5,1,L0
4,31,21,1.91,1,L0
5,31,21,2.3,1,L0
6,31,21,2.75,0,L0
7,31,21,3.14,0,L0
8,31,22,3.0,2,L0
9,31,22,3.47,1,L0
10,31,22,3.930,0,L0
0,37,1,0.2,0,L1
1,37,1,0.65,0,L1
2,37,1,1.089,0,L1
3,37,1,1.5198,0,L1
4,36,1,1.4197,2,L1
5,36,1,1.869,0,L1
6,36,1,2.3096,0,L1
7,36,1,2.738,0,L1
8,36,1,3.16,0,L1
9,36,1,3.5703,0,L1
10,36,1,3.970,0,L1
", header = TRUE, sep =",")
我想知道状态== 1的平均长度,由ID分组。我创建了一个启发的函数:
要计算RLE平均部分:
rle_mean_lengths = function(x, value) {
r = rle(x)
cond = r$values == value
data.frame(count = sum(cond), avg_length = mean(r$lengths[cond]))
}
然后我添加分组方面:
DF %>% group_by(ID) %>% do(rle_mean_lengths(DF$state,1))
但是,生成的值不正确:
ID |
计数 |
AVG_LENGTH |
|
|
|
1 L0 |
2 |
2 2 |
2 L1 |
2 |
2 2 |
L0正确,L1没有状态== 1的实例因此平均值应为零或NA。
我将问题分解为简单的总结而隔离了问题:
DF %>% group_by(ID) %>% summarize_at(vars(state),list(name=mean)) # This works but if I use summarize it gives me weird values again.
如何为do()做等效的summarize_at()?还是还有另一个修复程序?谢谢
I would like to calculate duration of state using rle()
on grouped data. Here is test data frame:
DF <- read.table(text="Time,x,y,sugar,state,ID
0,31,21,0.2,0,L0
1,31,21,0.65,0,L0
2,31,21,1.0,0,L0
3,31,21,1.5,1,L0
4,31,21,1.91,1,L0
5,31,21,2.3,1,L0
6,31,21,2.75,0,L0
7,31,21,3.14,0,L0
8,31,22,3.0,2,L0
9,31,22,3.47,1,L0
10,31,22,3.930,0,L0
0,37,1,0.2,0,L1
1,37,1,0.65,0,L1
2,37,1,1.089,0,L1
3,37,1,1.5198,0,L1
4,36,1,1.4197,2,L1
5,36,1,1.869,0,L1
6,36,1,2.3096,0,L1
7,36,1,2.738,0,L1
8,36,1,3.16,0,L1
9,36,1,3.5703,0,L1
10,36,1,3.970,0,L1
", header = TRUE, sep =",")
I want to know the average length for state == 1, grouped by ID. I have created a function inspired by: https://www.reddit.com/r/rstats/comments/brpzo9/tidyverse_groupby_and_rle/
to calculate the rle average portion:
rle_mean_lengths = function(x, value) {
r = rle(x)
cond = r$values == value
data.frame(count = sum(cond), avg_length = mean(r$lengths[cond]))
}
And then I add in the grouping aspect:
DF %>% group_by(ID) %>% do(rle_mean_lengths(DF$state,1))
However, the values that are generated are incorrect:
ID |
count |
avg_length |
|
|
|
1 L0 |
2 |
2 |
2 L1 |
2 |
2 |
L0 is correct, L1 has no instances of state == 1 so the average should be zero or NA.
I isolated the problem in terms of breaking it down into just summarize:
DF %>% group_by(ID) %>% summarize_at(vars(state),list(name=mean)) # This works but if I use summarize it gives me weird values again.
How do I do the equivalent summarize_at() for do()? Or is there another fix? Thanks
发布评论
评论(1)
由于它是一个data.frame列,因此我们可能需要
unnest
之后或删除
list
和在OP的
do do do
unwack do /code>代码,应提取的列不应来自整个数据,而应从来自LHS IE的数据中弃用。
As it is a data.frame column, we may need to
unnest
afterwardsOr remove the
list
andunpack
In the OP's
do
code, the column that should be extracted should be not from the whole data, but from the data coming fromt the lhs i.e..
(Note thatdo
is kind of deprecated. So it may be better to make use of thesummarise
withunnest/unpack