测量 Windows 上的跨进程延迟
我正在将延迟测量构建到我正在构建的通信中间件中。我让它工作的方式是我定期从我的发布应用程序发送探测消息。订阅应用程序接收此探测,将其缓存,并在自己选择的时间发回回显,并记录消息被“搁置”的时间。订阅应用程序接收这些回显,并将延迟计算为 (now() – time_sent – time_on_hold) / 2。
这有点有效,但当“保持时间”大于 0 时,数字有很大不同 (3x)。即,如果我回显消息立即返回,我在我的开发环境上得到了大约 50us,如果我等待,然后将消息发回,时间会跳到 150us(尽管我会打折我等待的任何时间)。我使用 QueryPerfomanceCounter 进行所有测量。
这一切都在一个 Windows 7 盒子里。我在这里缺少什么?
TIA。
I am building latency measurement into a communication middleware I am building. The way I have it working is that I periodically send a probe msg from my publishing apps. Subscribing apps receive this probe, cache it, and send an echo back at a time of their choosing, noting how much time the msg was kept “on hold”. The subscribing app receives these echos and calculates latency as (now() – time_sent – time_on_hold) / 2.
This kinda works, but the numbers are vastly different (3x) when “time on hold” is greater than 0. I.e if I echo the msg back immediately I get around 50us on my dev env and if I wait, then send the msg back the time jumps to 150us (though I discount whatever time I was on hold). I use QueryPerfomanceCounter for all measurements.
This is all inside a single Windows 7 box. What am I missing here?
TIA.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
更多信息。我使用以下方法来测量时间:
在发布端我做了类似的事情:
在接收端:
A bit more information. I am using the following to measure time:
On the publish side I do something like:
And on the receiver side:
好的,我编辑了我的答案以反映您的答案:很抱歉延迟了,但我没有注意到您通过创建答案来详细说明了该问题。
从功能上看,您似乎没有做错什么。
我认为,当您在本地主机条件之外分发应用程序时,与正常运行的网络的平均延迟相比,额外的 100us(如果它确实大致恒定)将显得微不足道。
为了回答您的问题,我认为服务器端存在需要调查的线程/中断调度问题,因为您似乎没有在客户端上执行任何未考虑到的操作。
尝试以下测试场景:
这个想法是希望两个客户端都能在大致相同的时间发回他们的探测答案,并且都在睡眠后/wait(执行暴露问题的操作)。目标是尝试获取其中一个客户的响应,以“唤醒”发布商,看看是否会立即处理下一个客户的响应。
如果这些返回的探测器之一没有显示异常(很可能是第二个响应),则可能表明发布者线程正在从睡眠周期中唤醒(在recv第一个响应上)并立即可用于处理第二个响应。
同样,如果事实证明 100us 延迟大致恒定,则它将是 1ms 的 +-10%,这是适合实际网络条件的时间范围。
Ok, I have edited my answer to reflect your answer: Sorry for the delay, but I didn't notice that you had elaborated on the question by creating an answer.
It's seems that functionally you are doing nothing wrong.
I think that when you distribute your application outside of localhost conditions, the additional 100us (if it is indeed roughly constant) will pale into insignificance compared to the average latency of a functioning network.
For the purposes of answering your question am thinking that there is a thread/interrupt scheduling issue on the server side that needs to be investigated, as you do not seem to be doing anything on the client that is not accounted for.
Try the following test scenario:
The idea being that hopefully, both clients will send back their probe answers at roughly the same time and both after a sleep/wait (performing the action that exposes the problem). The objective is to try to get one of the clients responses to 'wake up' the publisher to see if the next clients answer will be processed immediately.
If one of these returned probes is not showing the anomily (most likely the second response) it could point to the fact that the publisher thread is waking from a sleep cycle (on recv 1st responce) and is immediately available to process the second response.
Again, if it turns out that the 100us delay is roughly constant, it will be +-10% of 1ms which is the timeframe appropriate for realworld network conditions.