Windows Azure 上的时钟同步质量如何?
我正在寻找 Windows Azure 上虚拟机之间时钟偏移的定量估计 - 假设所有虚拟机都托管在同一个数据中心中。我猜测一个虚拟机与另一个虚拟机之间的平均时钟偏移低于 10 秒,但我什至不确定它是否是 Azure 云的保证属性。
有人对此事进行定量测量吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我终于决定自己做一些实验。
有关实验协议的一些事实:
Stopwatch
测量的存储和虚拟机之间的延迟始终低于 1 毫秒(基本上 HTTP 请求返回时出现 400 个错误,但仍然具有Date:
在HTTP 标头)。结果:
因此,从技术上讲,我们距离 2 秒的容差目标并不太远,尽管对于数据中心内同步,您不必将实验推得太远即可观察到接近 4 秒的偏移。如果我们假设时钟偏移呈正态(又称高斯)分布,那么我想说,依赖任何低于 6 秒的时钟阈值必然会导致调度问题。
I have finally settled to do some experiments on my own.
A few facts concerning the experiment protocol:
Stopwatch
was always lower than 1ms for minimalistic unauthenticated requests (basically HTTP requests were coming back with 400 errors, but still withDate:
available in the HTTP headers).Results:
So technically, we are not too far from the 2s tolerance target, although for intra-data-center sync, you don't have to push the experiment far to observe close to 4s offset. If we assume a normal (aka Gaussian) distribution for the clock offsets, then I would say that relying on any clock threshold lower than 6s is bound to lead to scheduling issues.
最近,我一直在与 Azure 产品团队的某人讨论时钟同步问题,这比其他任何事情都更感兴趣。我收到的最新回复是:
I've been in conversation with someone from the Azure product team regarding clock synchronisation recently, more out of interest than anything else. The most recent reply I've received is:
这就是分布式系统和虚拟机的经典问题——时钟偏差。
一种可能的解决方案是使用 Azure 调度程序对每个 VM 上的端点执行 ping 操作,这将重置您的时钟 - 或者至少告诉您差异是什么。这样,偏差就不会增加,您甚至可以计算通信延迟的偏移量。这样,您就可以在几毫秒而不是几秒内完成。
当然,您也可以采用其他方式,在虚拟机上提供一项服务,通过 ping 到某个时间服务器来定期管理时钟。我不确定虚拟机管理程序是否会让您弄乱它的时钟,但您真正需要的只是应用程序要使用的偏移量。
总的来说……永远不要相信虚拟机上的时钟,更不要相信分布式系统上的时钟。请注意,这个时钟问题是许多大学积极研究的一部分。 IE。 https://scholar.google.com/scholar?hl=en&q=distributed+system+clock&btnG=&as_sdt=1%2C48&as_sdtp=
This is the classic problem of both distributed systems and virtual machines - clock skew.
One possible solution would be to use the Azure scheduler to ping an endpoint on each of your VM that would reset your clock - or at least tell you what the diff would be. That way, your skew would not grow, and you may even be able to calculate an offset for the communication delay. This way, you'd get to within milliseconds and not seconds.
Ofcourse, you could also go the other way, and have a service on the VM that periodically manages the clock by pinging out to some time server. I'm not sure if the hypervisor will let you mess with it's clock, but all you really need is an offset for your apps to consume.
Overall... never trust the clock on a VM, and certainly not over a distributed system. Note that this clock issue is part of active research in many universities. ie. https://scholar.google.com/scholar?hl=en&q=distributed+system+clock&btnG=&as_sdt=1%2C48&as_sdtp=
根据我的经验,我不会依赖 Azure VM 的系统时钟来处理任何重要的事情。我偶尔会看到长达几分钟的差异,这确实与您的预期背道而驰。
Based on my experience, I would not rely on the system clock of the Azure VMs for anything critical. I have occasionally seen differences up to several minutes, which does fly in the face of what you'd expect.
我试图寻找这个具体问题的答案 - 但没有成功!
我发现了一些关于“Windows 时间服务”的参考资料 - W32Time - Windows 服务的设计目标是 2 秒 的容差 - 例如
在 Azure 网络中的实践中,我预计实现的同步应该比这好得多 - 但我的搜索出现了对此没有提及的保证。
I've tried to search for an answer to this specific question - but haven't succeeded!
Some references I have found about the "Windows Time Service" - W32Time - reference that the design for the Windows service targets a tolerance of 2 seconds - e.g.
In practice within the Azure network I expect that the synchronisation achieved should be much better than this - but my search turned up no referenced guarantees on this.
如果您正在构建分布式系统,您永远不能信任时钟同步,除非使用特殊的硬件措施,例如在 Google Spanner 中。即使有一种特殊的算法也被用来解决可能的时钟偏差冲突。
然而,有很多算法可以在分布式系统中解决这个问题:逻辑时钟、矢量时钟、Lamport 时间戳等等。请参阅 Andrew Tanenbaum 所著的经典著作《分布式系统:原理与范式》。
You can never trust clocks synchronization if you are building distributed system unless special hardware measures are used as for example in Google Spanner. Even there a special algorithm is used to resolve possible clock skew conflicts.
However, there are many algorithms, which allow to solve this problem in distributed systems: logical clocks, vector clocks, Lamport timestamps to name a few. See classical book "Distributed Systems: Principles and Paradigms" by Andrew Tanenbaum.