在随着时间的推移重复的条件内总结
我正在尝试使用与随时间重复的条件以不同的间隔重复的数据集在时间间隔内总结数据。我想在每个条件的时间间隔内获得手段和标准偏差。
但是,在我的真实数据中,我不知道每个条件都会有多少个间隔。我认为也许我可以通过从一行到下一行的状态变化来指示间隔的结束。但是我不知道该如何编码。
library(tidyverse)
df <- data.frame(Condition = c(rep("A", 50),
rep("B", 60),
rep("C", 50),
rep("A", 60),
rep("B", 50),
rep("C", 50)),
Time = c(seq(160, 190, length.out = 50),
seq(190.05, 230, length.out = 60),
seq(230.05, 260, length.out = 50),
seq(260.05, 293, length.out = 60),
seq(293.05, 321, length.out = 50),
seq(321.05, 352, length.out = 50))
) %>%
rowwise() %>%
mutate(X = rnorm(1.4, 0.3))
我正在尝试计算每个条件间隔(编号数)的平均值(x)和SD(x):
Condition interval mean(X) sd(X)
A [160,190] 1.4 0.32
B [190.05,230] 1.46 0.36
C [230.05,260] 1.32 0.26
A [260.05,293] 1.5 0.40
B [293.05,321] 1.25 0.34
C [321.05,352] 1.43 0.41
我已经尝试过,但是它没有做我需要的事情:
df %>%
group_by(Condition) %>%
mutate(interval = cut(Time,
breaks = c(floor(min(Time)), ceiling(max(Time))),
include.lowest = F,
right = F)) %>%
group_by(Condition, interval) %>%
summarise( mean.X = mean(X),
sd.X = sd(X))
这不会给我第二个每个条件的间隔:
Condition interval mean.X sd.X
<chr> <fct> <dbl> <dbl>
1 A [160,293) 0.231 0.991
2 A NA 1.61 NA
3 B [190,321) 0.421 0.893
4 B NA 0.249 NA
5 C [230,352) 0.193 0.898
6 C NA 0.427 NA
有什么建议?
I am trying to summarize data within time intervals using a data set with conditions repeated over time at varying intervals. I would like to get means and standard deviations within time intervals for each of the conditions.
However, in my real data I don't know how many intervals of each condition there will be. I thought perhaps I could indicate the end of an interval by a change in Condition from one row to the next row. But I don't know how to code that.
library(tidyverse)
df <- data.frame(Condition = c(rep("A", 50),
rep("B", 60),
rep("C", 50),
rep("A", 60),
rep("B", 50),
rep("C", 50)),
Time = c(seq(160, 190, length.out = 50),
seq(190.05, 230, length.out = 60),
seq(230.05, 260, length.out = 50),
seq(260.05, 293, length.out = 60),
seq(293.05, 321, length.out = 50),
seq(321.05, 352, length.out = 50))
) %>%
rowwise() %>%
mutate(X = rnorm(1.4, 0.3))
I'm trying to calculate mean(X) and sd(X) for each interval of Condition (made up numbers):
Condition interval mean(X) sd(X)
A [160,190] 1.4 0.32
B [190.05,230] 1.46 0.36
C [230.05,260] 1.32 0.26
A [260.05,293] 1.5 0.40
B [293.05,321] 1.25 0.34
C [321.05,352] 1.43 0.41
I've tried this, but it doesn't do what I need:
df %>%
group_by(Condition) %>%
mutate(interval = cut(Time,
breaks = c(floor(min(Time)), ceiling(max(Time))),
include.lowest = F,
right = F)) %>%
group_by(Condition, interval) %>%
summarise( mean.X = mean(X),
sd.X = sd(X))
This doesn't give me the second intervals for each Condition:
Condition interval mean.X sd.X
<chr> <fct> <dbl> <dbl>
1 A [160,293) 0.231 0.991
2 A NA 1.61 NA
3 B [190,321) 0.421 0.893
4 B NA 0.249 NA
5 C [230,352) 0.193 0.898
6 C NA 0.427 NA
Any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我们可以使用
rle
来定义您状况的“组”。We can use
rle
to define "groups" of your Condition.我绝对认为应该有一种不那么混乱的方法,但是
kmeans()
给出了以下可能的解决方案:由您决定
na
s的处理方式。编辑1:
添加 @sinh nguyen '
编辑2:回答更新的问题:
我们可以从
data.table
中使用rleid()
函数I definitely think there should be a less messy way to do it, but
kmeans()
gives the following possible solution:It's up to you how the
NA
s are dealt with.Edit 1:
Added @Sinh Nguyen's
cut
improvements.Edit 2: In response to updated question:
We can use the
rleid()
function fromdata.table
您拥有具有Na值的第二间间隔组的原因是由于您的输入到
cut> cut
函数中,其中right = f
,结果记录time == max(时间)
将从间隔输出中排除。您可以在上面的情况下进行一个记录,每个组都有
na
间隔。如果将
更改为
param toright = t
和incruph.lowest = t
,则将所有这些都包含在内。如果这不是您期望的,请更多地澄清您希望该间隔的方式
。 =“ nofollow noreferrer”> reprex软件包(v2.0.1)
The reasons that you have the 2nd interval group with NA values is due to your input to
cut
function whereright = F
which result records withTime == max(Time)
would be excluded from the interval output.As you can se above about there are one record having
NA
interval for each group.If you change
cut
param toright = T
andinclude.lowest = T
then you would included all of them.If this is not what you expected, please clarify more on how you would like the interval to be.,
Created on 2022-05-16 by the reprex package (v2.0.1)