Windows Azure 上的时钟同步质量如何?

发布于 2024-11-09 17:36:18 字数 179 浏览 6 评论 0 原文

我正在寻找 Windows Azure 上虚拟机之间时钟偏移的定量估计 - 假设所有虚拟机都托管在同一个数据中心中。我猜测一个虚拟机与另一个虚拟机之间的平均时钟偏移低于 10 秒,但我什至不确定它是否是 Azure 云的保证属性。

有人对此事进行定量测量吗?

I am looking for quantitative estimates on clock offsets between VMs on Windows Azure - assuming that all VMs are hosted in the same datacenter. I am guesstimating that average clock offset between one VM and another is below 10 seconds, but I am not even sure it's guaranteed property of the Azure cloud.

Has anybody some quantitative measurements on that matter?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

长梦不多时 2024-11-16 17:36:18

我终于决定自己做一些实验。

有关实验协议的一些事实:

  • 我没有寻找与参考时钟的偏移,而是简单地检查了Azure VMAzure 存储
  • 已使用下面粘贴的 HTTP hack 检索 Azure 存储的时钟时间。
  • 测量是在具有 250 个小型虚拟机的 Azure 北欧数据中心内完成的。
  • 对于简约的未经身份验证的请求,使用 Stopwatch 测量的存储和虚拟机之间的延迟始终低于 1 毫秒(基本上 HTTP 请求返回时出现 400 个错误,但仍然具有 Date: 在HTTP 标头)。

结果

  • 大约 50% 的虚拟机与存储的时钟偏移大于 1 秒。
  • 大约 5% 的虚拟机与存储的时钟偏移大于 2 秒。
  • 时钟偏移接近 3 秒的观测值不到 1%。
  • 少数离群值接近 4。
  • 从一个请求到下一个请求,单个虚拟机和存储之间的时钟偏移通常会变化 +1/-1 秒。

因此,从技术上讲,我们距离 2 秒的容差目标并不太远,尽管对于数据中心内同步,您不必将实验推得太远即可观察到接近 4 秒的偏移。如果我们假设时钟偏移呈正态(又称高斯)分布,那么我想说,依赖任何低于 6 秒的时钟阈值必然会导致调度问题。

/// <summary>
/// Substitute for proper NTP (Network Time Protocol) 
/// when UDP is not available, as on Windows Azure.
/// </summary>
public class HttpTimeChecker
{
    public static DateTime GetUtcNetworkTime(string server)
    {
        // HACK: we can't use WebClient here, because we get a faulty HTTP response
        // We don't care about HTTP error, the only thing that matter is the presence
        // of the 'Date:' HTTP header
        var tc = new TcpClient();
        tc.Connect(server, 80);

        string response;
        using (var ns = tc.GetStream())
        {
            var sw = new StreamWriter(ns);
            var sr = new StreamReader(ns);

            string req = "";
            req += "GET / HTTP/1.0\n";
            req += "Host: " + server + "\n";
            req += "\n";

            sw.Write(req);
            sw.Flush();

            response = sr.ReadToEnd();
        }

        foreach(var line in response.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries))
        {
            if(line.StartsWith("Date: "))
            {
                return DateTime.Parse(line.Substring(6)).ToUniversalTime();
            }
        }

        throw new ArgumentException("No date to be retrieved among HTTP headers.", "server");
    }
}

I have finally settled to do some experiments on my own.

A few facts concerning the experiment protocol:

  • Instead of looking for offset to an reference clock, I have simply checked clock differences between Azure VMs and the Azure Storage.
  • Clock time of the Azure Storage has been retrieved using the HTTP hack pasted below.
  • Measurements have been done within the North Europe datacenter of Azure with 250 small VMs.
  • Latency between storage and VMs measured with Stopwatch was always lower than 1ms for minimalistic unauthenticated requests (basically HTTP requests were coming back with 400 errors, but still with Date: available in the HTTP headers).

Results:

  • About 50% of the VMs have a clock offset to the storage greater than 1s.
  • About 5% of the VMs have a clock offset to the storage greater than 2s.
  • Less than 1% observations for clock offsets close 3s.
  • A handfew outliers close to 4s.
  • The clock offset between a single VM and the storage typically vary of +1/-1s from one request to the next.

So technically, we are not too far from the 2s tolerance target, although for intra-data-center sync, you don't have to push the experiment far to observe close to 4s offset. If we assume a normal (aka Gaussian) distribution for the clock offsets, then I would say that relying on any clock threshold lower than 6s is bound to lead to scheduling issues.

/// <summary>
/// Substitute for proper NTP (Network Time Protocol) 
/// when UDP is not available, as on Windows Azure.
/// </summary>
public class HttpTimeChecker
{
    public static DateTime GetUtcNetworkTime(string server)
    {
        // HACK: we can't use WebClient here, because we get a faulty HTTP response
        // We don't care about HTTP error, the only thing that matter is the presence
        // of the 'Date:' HTTP header
        var tc = new TcpClient();
        tc.Connect(server, 80);

        string response;
        using (var ns = tc.GetStream())
        {
            var sw = new StreamWriter(ns);
            var sr = new StreamReader(ns);

            string req = "";
            req += "GET / HTTP/1.0\n";
            req += "Host: " + server + "\n";
            req += "\n";

            sw.Write(req);
            sw.Flush();

            response = sr.ReadToEnd();
        }

        foreach(var line in response.Split(new[] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries))
        {
            if(line.StartsWith("Date: "))
            {
                return DateTime.Parse(line.Substring(6)).ToUniversalTime();
            }
        }

        throw new ArgumentException("No date to be retrieved among HTTP headers.", "server");
    }
}
两仪 2024-11-16 17:36:18

最近,我一直在与 Azure 产品团队的某人讨论时钟同步问题,这比其他任何事情都更感兴趣。我收到的最新回复是:

虚拟机和服务直接从底层获取时间
Hyper-V 平台在启动时以及从那时起时钟为
由服务维护。为了实现跨系统的真正时间同步
分布式系统,您需要在应用程序层执行此操作
和/或使用引用单一时间服务器的服务。

I've been in conversation with someone from the Azure product team regarding clock synchronisation recently, more out of interest than anything else. The most recent reply I've received is:

The VMs and services take their time directly from the underlying
Hyper-V platform upon boot and from that point forward the clock is
maintained by the service. In order to have true time sync across a
distributed system you will need to do this at the application layer
and/or with a service referencing an singular time server.

贱人配狗天长地久 2024-11-16 17:36:18

这就是分布式系统和虚拟机的经典问题——时钟偏差。

一种可能的解决方案是使用 Azure 调度程序对每个 VM 上的端点执行 ping 操作,这将重置您的时钟 - 或者至少告诉您差异是什么。这样,偏差就不会增加,您甚至可以计算通信延迟的偏移量。这样,您就可以在几毫秒而不是几秒内完成。

当然,您也可以采用其他方式,在虚拟机上提供一项服务,通过 ping 到某个时间服务器来定期管理时钟。我不确定虚拟机管理程序是否会让您弄乱它的时钟,但您真正需要的只是应用程序要使用的偏移量。

总的来说……永远不要相信虚拟机上的时钟,更不要相信分布式系统上的时钟。请注意,这个时钟问题是许多大学积极研究的一部分。 IE。 https://scholar.google.com/scholar?hl=en&q=distributed+system+clock&btnG=&as_sdt=1%2C48&as_sdtp=

This is the classic problem of both distributed systems and virtual machines - clock skew.

One possible solution would be to use the Azure scheduler to ping an endpoint on each of your VM that would reset your clock - or at least tell you what the diff would be. That way, your skew would not grow, and you may even be able to calculate an offset for the communication delay. This way, you'd get to within milliseconds and not seconds.

Ofcourse, you could also go the other way, and have a service on the VM that periodically manages the clock by pinging out to some time server. I'm not sure if the hypervisor will let you mess with it's clock, but all you really need is an offset for your apps to consume.

Overall... never trust the clock on a VM, and certainly not over a distributed system. Note that this clock issue is part of active research in many universities. ie. https://scholar.google.com/scholar?hl=en&q=distributed+system+clock&btnG=&as_sdt=1%2C48&as_sdtp=

淑女气质 2024-11-16 17:36:18

根据我的经验,我不会依赖 Azure VM 的系统时钟来处理任何重要的事情。我偶尔会看到长达几分钟的差异,这确实与您的预期背道而驰。

Based on my experience, I would not rely on the system clock of the Azure VMs for anything critical. I have occasionally seen differences up to several minutes, which does fly in the face of what you'd expect.

愚人国度 2024-11-16 17:36:18

我试图寻找这个具体问题的答案 - 但没有成功!

我发现了一些关于“Windows 时间服务”的参考资料 - W32Time - Windows 服务的设计目标是 2 秒 的容差 - 例如

在 Azure 网络中的实践中,我预计实现的同步应该比这好得多 - 但我的搜索出现了对此没有提及的保证。

I've tried to search for an answer to this specific question - but haven't succeeded!

Some references I have found about the "Windows Time Service" - W32Time - reference that the design for the Windows service targets a tolerance of 2 seconds - e.g.

In practice within the Azure network I expect that the synchronisation achieved should be much better than this - but my search turned up no referenced guarantees on this.

森末i 2024-11-16 17:36:18

如果您正在构建分布式系统,您永远不能信任时钟同步,除非使用特殊的硬件措施,例如在 Google Spanner 中。即使有一种特殊的算法也被用来解决可能的时钟偏差冲突。
然而,有很多算法可以在分布式系统中解决这个问题:逻辑时钟、矢量时钟、Lamport 时间戳等等。请参阅 Andrew Tanenbaum 所著的经典著作《分布式系统:原理与范式》。

You can never trust clocks synchronization if you are building distributed system unless special hardware measures are used as for example in Google Spanner. Even there a special algorithm is used to resolve possible clock skew conflicts.
However, there are many algorithms, which allow to solve this problem in distributed systems: logical clocks, vector clocks, Lamport timestamps to name a few. See classical book "Distributed Systems: Principles and Paradigms" by Andrew Tanenbaum.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文