根据子组对 R 变量进行分组
我有一个格式为的数据,
PERSON_A PERSON_B MEET LEAVE
它基本上描述了 PERSON_A 在 MEET 时间遇到 PERSON_B 并且他们在离开时互相说“再见”。时间以秒为单位表示,http://pastie.org/2825794(简单.dat)。
我需要的是计算按天分组的会议数量。目前,我有一个可以运行的代码,但外观并不美观。无论如何,我需要帮助才能将其转换为反映我尝试进行的分组的代码,例如使用 ddply 等。因此,我的主要目的是从这个案例中学习。这段代码可能存在许多关于 R 中良好实践的错误。
library(plyr)
data = read.table("simple.dat", stringsAsFactors=FALSE)
names(data)=c('PERSON_A','PERSON_B','MEET','LEAVE')
attach(data)
min_interval = min(MEET)
max_interval = max(LEAVE)
interval = max_interval - min_interval
day = 86400
number_of_days = floor(interval/day)
g = data.frame(MEETINGS=c(0:number_of_days)) # just to store the result
g[,1] = 0
start_offset = min_interval # start of the first day
for (interval in c(0:number_of_days)) {
end_offset = start_offset + day
meetings = (length(data[data$MEET >= start_offset & data$LEAVE <= end_offset, ]$PERSON_A) + length(data[data$MEET >= start_offset & data$LEAVE <= end_offset, ]$PERSON_B))
g[interval+1, ] = meetings
start_offset = end_offset # start next day
}
g
这段代码在几天内迭代(间隔 86400 秒)并将会议数量存储在数据帧 g 上。在链接数据集上执行时,此代码的正确输出(如下所示)给出了每行(天)的会议数量。
MEETINGS
1 38
2 10
3 16
4 18
5 24
6 6
7 4
8 10
9 28
10 14
11 22
12 2
13 .. 44 0 # I simplified the output here
45 2
不管怎样,我知道我可以使用 ddply 来获取每对 o 节点的会议数量:
contacts <- ddply(data, .(PERSON_A, PERSON_B), summarise
, CONTACTS = length(c(PERSON_A, PERSON_B)) /2
)
但是对于我来说,这和我需要的结果之间有一个巨大的山丘。
作为尾注,我阅读了 如何制作一个出色的 R 可重现示例? 并尽力了:)
谢谢,
I have a data formatted as
PERSON_A PERSON_B MEET LEAVE
That describes basically when a PERSON_A met a PERSON_B at time MEET and they said "bye" to each other at moment LEAVE. The time is expressed in seconds, and there is a small part of the data on http://pastie.org/2825794 (simple.dat).
What I need is to count the number of meetings grouping it by day. At the moment, I have a code that works, the appearance is not beautiful. Anyway, I'd like a help in order to transform it in a code that reflects the grouping Im trying to do, e.g, using ddply, etc. Therefore, my main aim is to learn from this case. Probably there are many mistakes in this code regarding good practices in R.
library(plyr)
data = read.table("simple.dat", stringsAsFactors=FALSE)
names(data)=c('PERSON_A','PERSON_B','MEET','LEAVE')
attach(data)
min_interval = min(MEET)
max_interval = max(LEAVE)
interval = max_interval - min_interval
day = 86400
number_of_days = floor(interval/day)
g = data.frame(MEETINGS=c(0:number_of_days)) # just to store the result
g[,1] = 0
start_offset = min_interval # start of the first day
for (interval in c(0:number_of_days)) {
end_offset = start_offset + day
meetings = (length(data[data$MEET >= start_offset & data$LEAVE <= end_offset, ]$PERSON_A) + length(data[data$MEET >= start_offset & data$LEAVE <= end_offset, ]$PERSON_B))
g[interval+1, ] = meetings
start_offset = end_offset # start next day
}
g
This code iterates over the days (intervals of 86400 seconds) and stores the number of meetings on the dataframe g. The correct output (shown bellow) of this code when executed on the linked dataset gives for each line (day) the number o meetings.
MEETINGS
1 38
2 10
3 16
4 18
5 24
6 6
7 4
8 10
9 28
10 14
11 22
12 2
13 .. 44 0 # I simplified the output here
45 2
Anyway, I know that I could use ddply to get the number of meetings for each pair o nodes:
contacts <- ddply(data, .(PERSON_A, PERSON_B), summarise
, CONTACTS = length(c(PERSON_A, PERSON_B)) /2
)
but there is a huge hill for me between this and the result I need.
As a end note, I read How to make a great R reproducible example? and tried my best :)
Thanks,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
试试这个:
floor(x/(60*60*24))
是将秒转换为日的快速方法。try this:
floor(x/(60*60*24))
is a quick way to convert second into day.