计算 R 中某些操作/变量的持续时间和关键数字(平均值、标准差、最小值、最大值)?
我有一个包含 100000 多行和多个变量/列的数据框,我想
- 根据“Y”列中的值计算某些操作的持续时间。 Y 列有多个值 0 和 1 的序列,并且每当发生操作时,都会有值 1。这个想法是计算从一系列 1 中的第一个 1(紧接在最后一个 0 之后)到最后一个值的时间差序列中的 1(就在下一个 0 之前)。对于所有 1 和 0 的每个对应行,当前运行时间的“X”列中始终有一个时间戳,因此基本上可以通过简单的减法来计算时间差:
TIME_OF_FINAL_1_IN_SEQUENCE minus TIME_OF_FIRST_1_IN_SEQUENCE
相同的计算将重复多次对于所有不同的序列,将创建一个列出该操作的所有不同持续时间的新数据帧。
- 以类似的方式,对于“Z”列中的值,计算所有不同序列从 1 序列的第一个 1 到 1 序列的最后 1 个周期的平均值、标准偏差、最小值和最大值。然后将所有数据组合在一起作为一个数据帧并将其导出为 csv 文件,其中应包含“动作持续时间”、“Z 平均值”、“Z 标准”、“Z 最小值”、“Z 最大值”和“ id”列来自原始数据帧。我怎样才能在 R 中编写这样的脚本?
伪样式代码可能看起来像这样:(
for all the rows in df {
if (number 1 in column Y) {
from first 1 until the last 1 in a sequence: calculate TIME_OF_FINAL_1_IN_SEQUENCE minus TIME_OF_FIRST_1_IN_SEQUENCE from column X
ALSO from the range of first value of 1 to the last value of 1 in this sequence of 1: calculate avg, std, min, and max for the variable Z
if (number 0) in column
add new element/row to the list (including the variables of: "action duration", "Z avg", "Z std", "Z min", "Z max" and the "id") and move to the next 1
不确定伪代码中的算法是否正是我在文本中描述的,但至少我尽力在此处包含某种“代码示例” :-))
I have a dataframe with 100000 + rows and multiple variables/columns from which I would like to
- Calculate duration of a certain actions based on values in the column "Y". Column Y has multiple sequences of values 0 and 1 and whenever action takes place, there is values of 1. The idea would be to count a time difference from the first 1 in a sequence of ones (right after the last 0) until the final 1 in the sequence (right before next 0). For the every corresponding row of all the ones and zeros, there is always a timestamp in column "X" for the current runtime, so the time difference would basically be calculated from that with a simple substraction:
TIME_OF_FINAL_1_IN_SEQUENCE minus TIME_OF_FIRST_1_IN_SEQUENCE
This same calculation would be repeated multiple times for all the different sequences of ones and a new dataframe listing all of the different durations for the action would be created.
- In a similar manner, for the values in the column "Z", calculate average, standard deviation, min and max from the period of first 1 of a sequence of ones until the final 1 of a sequence of ones for all of the different sequences. Then combine all the data together as one dataframe and export it as a csv-file, which should include variables for "action durations", "Z avg", "Z std", "Z min", "Z max" and the "id" column from the original dataframe. How could I write script like this in R?
The pseudo style code could probably look something like this:
for all the rows in df {
if (number 1 in column Y) {
from first 1 until the last 1 in a sequence: calculate TIME_OF_FINAL_1_IN_SEQUENCE minus TIME_OF_FIRST_1_IN_SEQUENCE from column X
ALSO from the range of first value of 1 to the last value of 1 in this sequence of 1: calculate avg, std, min, and max for the variable Z
if (number 0) in column
add new element/row to the list (including the variables of: "action duration", "Z avg", "Z std", "Z min", "Z max" and the "id") and move to the next 1
(Not sure if the algorithm in the pseudo code is exactly what I was describing in the text, but at least I tried my best to include some kind of "code example" here as well :-))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我相信您有多个可能的连续行 1 和 0 的序列。我认为该方法是为每个序列生成一个唯一的标识符,并估计您想要的每个标识符的统计信息。使用
data.table
和data.table::rleied
可以轻松完成此操作输出:
输入:
I believe you have multiple possible sequences of consecutive rows of ones and zeros.. I think the approach is to generate a unique identifier for each sequence and the estimate the statistics you want over each of these identifiers. This is easily done using
data.table
, anddata.table::rleied
Output:
Input: