是否有“产量”?就像 R 中的构造一样?

发布于 2024-12-11 13:58:45 字数 3670 浏览 0 评论 0原文

我对 R 有点陌生,刚刚开始使用它来绘制一些图表。

我有这样的代码:

times=integer(nrow(df));
for(i in 1:nrow(df)) {
  time=df[i+1,4]-df[i,4];
  times[i]<-time
}

必须有一种更聪明的方法来做到这一点,而无需首先初始化时间,不是吗? 我不确定,但我正在寻找的是这样的:(

times <- for(i in 1:nrow(df)) yield df[i+1,4]-df[i,4]

我知道这不是有效的代码:)) 我希望这个问题还没有被问到。我搜索后没有找到任何关于“yield”和数组初始化的具体内容。

根据要求....

df 中的示例数据:

7926 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7927 08:00:27:ed:f3:e5 MESSAGEHANDLER   END 1.319242e+12
7928 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7929 08:00:27:ed:f3:e5 MESSAGEHANDLER   END 1.319242e+12
7930 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7931 08:00:27:ed:f3:e5 MESSAGEHANDLER   END 1.319242e+12
7932 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7933 08:00:27:ed:f3:e5 MESSAGEHANDLER   END 1.319242e+12
7934 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7935 08:00:27:ed:f3:e5 MESSAGEHANDLER   END 1.319242e+12
7936 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7937 08:00:27:ed:f3:e5 MESSAGEHANDLER   END 1.319242e+12
7938 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7939 08:00:27:ed:f3:e5 MESSAGEHANDLER   END 1.319242e+12

在我的循环之后是 times 是:

[7921] 508 500 497 501 466 502 505 500 488 501 500 501 490 501 478 501 501 501
[7939]  NA

好的,为了更具体,我真正想做的是:

times1=integer(nrow(df));for(i in 1:nrow(df)) { if (df[i,3] == "START") times1[i]<-df[i+1,4]-df[i,4]}
times2=integer(nrow(df));for(i in 1:nrow(df)) { if (df[i,3] == "END") times2[i]<-df[i+1,4]-df[i,4]}

然后输出类似于 for times1:

[7921]   0 500   0 501   0 502   0 500   0 501   0 501   0 501   0 501   0 501
[7939]   0

但我需要:

[3960]   500   501   502   500   501   501   501   501   501

换句话说:

我正在解析 csv 文件中的测量数据,该文件位于 df 中,如上所示。 这是针对“START”后跟“END”的情况

。 df 中的数据描述了当 df[,3] 中存在“START”时,在 df[,4] 中的特定 unixtime(以毫秒为单位)接收到数据包。 现在我需要计算从接收到发送所经过的时间(这是时间,我的机器需要分析接收到的数据包并计算结果以发送它。) 因此 df[,3] 中的 END 表示数据包在 unixtime df[,4] 处已成功发送。

另一种情况是“END”后跟“START”,

这是“我的数据包已发送”和新数据包“已收到”之间经过的时间。

我现在添加一个 csv 样本和完整的复制代码:

#load csv in df!
df = read.csv("/tmp/measure.csv",FALSE)
absolute=integer(nrow(df));for(i in 1:nrow(df)) {time=df[i,4]-df[1,4];absolute[i]<-(time/1000)}
times=integer(nrow(df));for(i in 1:nrow(df)) {time=df[i+1,4]-df[i,4];times[i]<-time}
#plot(absolute,times)
plot(absolute,times,lty=1,pch=1,col="#11223399",type="l")
lines(absolute,array(mean(times,na.rm=1),nrow(df)),col="red")

这里是我的measure.csv:

08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238175202
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238175690
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238176195
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238176665
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238177167
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238177669
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238178172
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238178639
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238179139
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238179658
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238180161
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238180654
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238181154
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238181669
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238182170
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238182629
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238183130

我希望这能让它更清楚。

I'm kinda new to R and just started using it to plot some graphs.

I have this code:

times=integer(nrow(df));
for(i in 1:nrow(df)) {
  time=df[i+1,4]-df[i,4];
  times[i]<-time
}

There must be a more clever way to do this, without first initializing times, isn't it?
I'm not sure, but what I'm searching for is something like:

times <- for(i in 1:nrow(df)) yield df[i+1,4]-df[i,4]

(I know this is not valid code :))
I hope this question isn't asked already. I searched and didn't find anything concrete on "yield" and initializing of arrays.

As requested....

Sample data in df:

7926 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7927 08:00:27:ed:f3:e5 MESSAGEHANDLER   END 1.319242e+12
7928 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7929 08:00:27:ed:f3:e5 MESSAGEHANDLER   END 1.319242e+12
7930 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7931 08:00:27:ed:f3:e5 MESSAGEHANDLER   END 1.319242e+12
7932 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7933 08:00:27:ed:f3:e5 MESSAGEHANDLER   END 1.319242e+12
7934 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7935 08:00:27:ed:f3:e5 MESSAGEHANDLER   END 1.319242e+12
7936 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7937 08:00:27:ed:f3:e5 MESSAGEHANDLER   END 1.319242e+12
7938 08:00:27:ed:f3:e5 MESSAGEHANDLER START 1.319242e+12
7939 08:00:27:ed:f3:e5 MESSAGEHANDLER   END 1.319242e+12

After my loop is times is:

[7921] 508 500 497 501 466 502 505 500 488 501 500 501 490 501 478 501 501 501
[7939]  NA

Ok, to get more concrete, what I really want to do is this:

times1=integer(nrow(df));for(i in 1:nrow(df)) { if (df[i,3] == "START") times1[i]<-df[i+1,4]-df[i,4]}
times2=integer(nrow(df));for(i in 1:nrow(df)) { if (df[i,3] == "END") times2[i]<-df[i+1,4]-df[i,4]}

Then the output is something like for times1:

[7921]   0 500   0 501   0 502   0 500   0 501   0 501   0 501   0 501   0 501
[7939]   0

But I need:

[3960]   500   501   502   500   501   501   501   501   501

In words:

I'm parsing measured data from a csv file, which lands in df as seeing above.
This is for "START" followed by "END"

The data in df describes that a packet was received when there is a "START" in df[,3] at a specific unixtime in miliseconds in df[,4].
Now I need to calculate the time that passed from receiving to sending (this is the time, my machine needs to analyze the RECEIVED PACKET and calculate a result to SEND it.)
So END in df[,3] means packet was sent successfully at unixtime df[,4].

The other case is "END" followed by "START"

This is the time that passed in between "my packet was sent" and a new one "was received".

I add now a sample of a csv and my full code for reproduction:

#load csv in df!
df = read.csv("/tmp/measure.csv",FALSE)
absolute=integer(nrow(df));for(i in 1:nrow(df)) {time=df[i,4]-df[1,4];absolute[i]<-(time/1000)}
times=integer(nrow(df));for(i in 1:nrow(df)) {time=df[i+1,4]-df[i,4];times[i]<-time}
#plot(absolute,times)
plot(absolute,times,lty=1,pch=1,col="#11223399",type="l")
lines(absolute,array(mean(times,na.rm=1),nrow(df)),col="red")

Here my measure.csv:

08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238175202
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238175690
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238176195
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238176665
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238177167
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238177669
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238178172
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238178639
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238179139
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238179658
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238180161
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238180654
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238181154
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238181669
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238182170
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,START,1319238182629
08:00:27:ed:f3:e5,TMCMESSAGEHANDLER,END,1319238183130

I hope this makes it more clear.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

动次打次papapa 2024-12-18 13:58:45

我认为你想计算向量中连续元素之间的差异。在这种情况下,您正在寻找 diff

set.seed(0)
x <- sample(1:10, 5)

x
[1] 1 2 9 5 3

diff(x)
[1]  1  7 -4 -2

I think you want to calculate the difference between successive elements in a vector. In that case you are looking for diff:

set.seed(0)
x <- sample(1:10, 5)

x
[1] 1 2 9 5 3

diff(x)
[1]  1  7 -4 -2
故事↓在人 2024-12-18 13:58:45

希望我不是太离谱 - 为什么不完全避免循环?:

    # generate some data sort of similar to yours:
    DF <- data.frame(pos4 = rep(c("START","END"),10),times=rep(0,20))
    DF$times[DF$pos4=="START"] <- 1:10
    DF$times[DF$pos4=="END"] <- DF$times[DF$pos4=="START"]+runif(10)
    DF
    DF
        pos4 times
    1  START  1.000000
    2    END  1.750459
    3  START  2.000000
    4    END  2.212599
    5  START  3.000000
    6    END  3.974809
    ....

我假设你的数据集中的开始和结束时间是有序的..

    (times <- DF$times[DF$pos4=="END"] - DF$times[DF$pos4=="START"]) 
    [1] 0.7504590 0.2125986 0.9748094 0.3313644 0.3448410 0.8677022 0.9534317
    [8] 0.1279304 0.6500212 0.1798664

不确定你需要做什么样的检查,因为它们不是不在您在问题中发布的 for 循环中。

-----------------编辑----------------------------------------

包含在评论中下面看来是正确的,
这确实是一个关于索引的问题:
其中:

    DIFFS <- diff(DF$times)

给出了所有差异,您只想将其分成两个对象,一个用于偶数索引,另一个用于奇数索引:

    times1 <- DIFFS[seq(from=1,to=length(DIFFS),by=2)]
    times2 <- DIFFS[seq(from=2,to=length(DIFFS),by=2)]

并且不相关,但也很有用:您使用“absolute”和“df”作为中的对象名称您的代码,但这些也是 R 中的函数,因此虽然它可以工作,但最好为它们提供尚未使用的名称。很高兴你得到了你想要的东西!

Hope I'm not too far off- why not avoid the loop altogether?:

    # generate some data sort of similar to yours:
    DF <- data.frame(pos4 = rep(c("START","END"),10),times=rep(0,20))
    DF$times[DF$pos4=="START"] <- 1:10
    DF$times[DF$pos4=="END"] <- DF$times[DF$pos4=="START"]+runif(10)
    DF
    DF
        pos4 times
    1  START  1.000000
    2    END  1.750459
    3  START  2.000000
    4    END  2.212599
    5  START  3.000000
    6    END  3.974809
    ....

I'm assuming the the START and END times in your dataset are in order..

    (times <- DF$times[DF$pos4=="END"] - DF$times[DF$pos4=="START"]) 
    [1] 0.7504590 0.2125986 0.9748094 0.3313644 0.3448410 0.8677022 0.9534317
    [8] 0.1279304 0.6500212 0.1798664

not sure about what kind of checks you need to do, since they weren't in the for loop you posted in the question.

-----------------EDIT---------------------------

to include from the comment below that appears to have gotten it right,
this really was a question about indexing:
where:

    DIFFS <- diff(DF$times)

gives you all the differences, you just wanted to split this into two objects, one for even indices, another for odd indices:

    times1 <- DIFFS[seq(from=1,to=length(DIFFS),by=2)]
    times2 <- DIFFS[seq(from=2,to=length(DIFFS),by=2)]

and unrelated, but also useful: you used 'absolute' and 'df' for the names of objects in your code, but these are also functions in R, so although it works, it's better form to give them names that aren't already taken. Glad you got what you were after!

烟花易冷人易散 2024-12-18 13:58:45

您还可以执行类似的操作

lapply(sequence(nrow(df)-1),function(i,df) df[i+1,4]-df[i,4],df)

,或者尝试用 sapply 代替 lapply (否则,语法相同)。

编辑:

更具体地说,我认为

times <- sapply(sequence(nrow(df)-1),function(i,df) df[i+1,4]-df[i,4],df)

or

times <- unlist(lapply(sequence(nrow(df)-1),function(i,df) df[i+1,4]-df[i,4],df))

可以解决这个问题。关于重塑,df 中没有识别变量将开始时间和结束时间配对在一起,因此必须手动执行此操作,假设要配对的两个时间出现在连续的行中:

times <- apply(matrix(df[,4],ncol=2,byrow=TRUE),1,diff)

You can also do something like

lapply(sequence(nrow(df)-1),function(i,df) df[i+1,4]-df[i,4],df)

or also try sapply in place of lapply (otherwise, same syntax).

Edit:

More specifically, I think

times <- sapply(sequence(nrow(df)-1),function(i,df) df[i+1,4]-df[i,4],df)

or

times <- unlist(lapply(sequence(nrow(df)-1),function(i,df) df[i+1,4]-df[i,4],df))

would do the trick. Regarding the reshaping, there is no identifying variable in df that would pair the start and end times together, so would have to do it manually assuming that the two to be paired occur in successive rows:

times <- apply(matrix(df[,4],ncol=2,byrow=TRUE),1,diff)
濫情▎り 2024-12-18 13:58:45

我正在出门,但有 2 条评论:1)向数据框添加列标题 2)我认为 OP 需要 reshape 包将他的开始结束时间分成 2 个不同的列,分别称为开始和结束。然后对向量使用 End-Start 操作。

I'm on my way out the door but 2 comments: 1) add column headers to the data frame 2) I think the OP needs the reshape package to split his start end times into 2 different columns called start and then end. then use End-Start operation on a vector .

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文