总结 R 数据框中的分组记录(...再次)
(我今天早些时候试图问这个问题,但后来意识到我过度简化了问题;我收到的答案是正确的,但我无法使用它们,因为我对原始问题中的问题过度简化。这是我的第二次尝试...)
我在 R 中有一个数据框,如下所示:
"Timestamp", "Source", "Target", "Length", "Content"
0.1 , P1 , P2 , 5 , "ABCDE"
0.2 , P1 , P2 , 3 , "HIJ"
0.4 , P1 , P2 , 4 , "PQRS"
0.5 , P2 , P1 , 2 , "ZY"
0.9 , P2 , P1 , 4 , "SRQP"
1.1 , P1 , P2 , 1 , "B"
1.6 , P1 , P2 , 3 , "DEF"
2.0 , P2 , P1 , 3 , "IJK"
...
我想将其转换为:
"StartTime", "EndTime", "Duration", "Source", "Target", "Length", "Content"
0.1 , 0.4 , 0.3 , P1 , P2 , 12 , "ABCDEHIJPQRS"
0.5 , 0.9 , 0.4 , P2 , P1 , 6 , "ZYSRQP"
1.1 , 1.6 , 0.5 , P1 , P2 , 4 , "BDEF"
...
尝试将其转换为英语,我想将具有相同“源”和“目标”的连续记录分组在一起,然后打印每组的一条记录,显示开始时间、结束时间和时间。该组的持续时间 (=EndTime-StartTime),以及该组的长度总和,以及该组中内容的串联(全部都是字符串)。
TimeOffset 值在整个数据帧中始终会增加。
我查看了melt/recast,感觉它可以用来解决问题,但无法理解文档。我怀疑在 R 中可以做到这一点,但我真的不知道从哪里开始。在紧要关头,我可以导出数据框并在例如Python中执行它,但如果可能的话,我更愿意留在R中。
预先感谢您可以提供的任何帮助
(I tried to ask this question earlier today, but later realised I over-simplified the question; the answers I received were correct, but I couldn't use them because of my over-simplification of the problem in the original question. Here's my 2nd attempt...)
I have a data frame in R that looks like:
"Timestamp", "Source", "Target", "Length", "Content"
0.1 , P1 , P2 , 5 , "ABCDE"
0.2 , P1 , P2 , 3 , "HIJ"
0.4 , P1 , P2 , 4 , "PQRS"
0.5 , P2 , P1 , 2 , "ZY"
0.9 , P2 , P1 , 4 , "SRQP"
1.1 , P1 , P2 , 1 , "B"
1.6 , P1 , P2 , 3 , "DEF"
2.0 , P2 , P1 , 3 , "IJK"
...
and I want to convert this to:
"StartTime", "EndTime", "Duration", "Source", "Target", "Length", "Content"
0.1 , 0.4 , 0.3 , P1 , P2 , 12 , "ABCDEHIJPQRS"
0.5 , 0.9 , 0.4 , P2 , P1 , 6 , "ZYSRQP"
1.1 , 1.6 , 0.5 , P1 , P2 , 4 , "BDEF"
...
Trying to put this into English, I want to group consecutive records with the same 'Source' and 'Target' together, then print out a single record per group showing the StartTime, EndTime & Duration (=EndTime-StartTime) for that group, along with the sum of the Lengths for that group, and a concatenation of the Content (which will all be strings) in that group.
The TimeOffset values will always increase throughout the data frame.
I had a look at melt/recast and have a feeling that it could be used to solve the problem, but couldn't get my head around the documentation. I suspect it's possible to do this within R, but I really don't know where to start. In a pinch I could export the data frame out and do it in e.g. Python, but I'd prefer to stay within R if possible.
Thanks in advance for any assistance you can provide
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是使用 plyr 的另一个解决方案:
Here's another solution using plyr:
试试这个:
PS:您需要将结果转换为“正确”的格式,因为最终结果是字符串矩阵。
Try this:
P.S.: You would need to convert the result to the "correct" format, as the end result is a matrix of strings.
坚持我的(不优雅的)方式
Sticking on my (not elegant) way