Windows服务——高可用性场景和设计方法

发布于 2024-08-27 19:28:48 字数 253 浏览 10 评论 0原文

假设我有一个独立的 Windows 服务在 Windows 服务器计算机上运行。如何保证高可用?

1).您可以提出哪些设计级别指南?

2)。如何使其像主/从一样高可用,例如目前市场上可用的集群解决方案

3)。在发生任何故障转移情况时如何处理横切问题

如果您能想到其他任何问题,请在此处添加..

注意: 该问题仅与Windows和Windows服务有关,请尽量遵守此规则:)

Let's say I have a standalone windows service running in a windows server machine. How to make sure it is highly available?

1). What are all the design level guidelines that you can propose?

2). How to make it highly available like primary/secondary, eg., the clustering solutions currently available in the market

3). How to deal with cross-cutting concerns in case any fail-over scenarios

If any other you can think of please add it here ..

Note:
The question is only related to windows and windows services, please try to obey this rule :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

沫雨熙 2024-09-03 19:28:48

要保持服务至少运行,您可以安排 Windows 服务管理器在服务崩溃时自动重新启动服务(请参阅服务属性上的“恢复”选项卡。)此处提供了更多详细信息,包括用于设置这些属性的批处理脚本 - < a href="https://serverfault.com/questions/48600/how-can-i-automatically-restart-a-windows-service-if-it-crashes">如果 Windows 服务崩溃,请重新启动

高可用性不仅仅是保持服务不受外部影响 - 服务本身需要在构建时考虑到高可用性(即始终使用良好的编程实践、适当的数据结构、对资源获取和释放),并且整个压力 -进行测试以确保其在预期负载下保持正常运行。

对于幂等命令,可以通过重新调用命令一定次数来容忍间歇性故障(例如锁定资源)。这允许服务保护客户端免受故障(在一定程度上)。客户端还应该进行编码以预测故障。客户端可以通过多种方式处理服务故障 - 记录、提示用户、重试 X 次、记录致命错误和退出都是可能的处理程序 - 哪一种适合您取决于您​​的要求。如果服务具有“会话状态”,当服务发生硬故障(即进程重新启动)时,客户端应该意识到并处理这种情况,因为这通常意味着当前会话状态已经丢失。

单台机器很容易出现硬件故障,因此如果您要使用单台机器,请确保它具有冗余组件。 HDD 特别容易出现故障,因此至少要有镜像驱动器或 RAID 阵列。 PSU 是下一个弱点,因此冗余 PSU 和 UPS 也是值得的。

至于集群,Windows 支持服务集群,并使用网络名称而不是单个计算机名称来管理服务。这允许您的客户端连接到运行该服务的任何计算机,而不是硬编码的名称。但除非您采取其他措施,否则这就是资源故障转移 - 将请求从一个服务实例定向到另一个服务实例。转换状态通常会丢失。如果您的服务正在写入数据库,那么也应该对其进行集群,以确保可靠性并确保更改可用于整个集群,而不仅仅是本地节点。

这实际上只是冰山一角,但我希望它能为您提供开始进一步研究的想法。

Microsoft 群集服务 (MSCS)

To keep the service at least running you can arrange for the Windows Service Manager to automatically restart the service if it crashes (see the Recovery tab on the service properties.) More details are available here, including a batch script to set these properties - Restart a windows service if it crashes

High availability is more than just keeping the service up from the outside - the service itself needs to be built with high-availabiity in mind (i.e. use good programming practices throughout, appropriate datastructures, pairs resource aquire and release), and the whole stress-tested to ensure that it will stay up under expected loads.

For idempotent commands, tolerating intermittent failures (such as locked resources) can be achieved by re-invoking the command a certain number of times. This allows the service to shield the client from the failure (up to a point.) The client should also be coded to anticipate failure. The client can handle service failure in several ways - logging, prompting the user, retrying X times, logging a fatal error and exiting are all possible handlers - which one is right for you depends upon your requirements. If the service has "conversation state", when service fails hard (i.e. process is restarted), the client should be aware of and handle ths situation, as it usually means current conversation state has been lost.

A single machine is going to be vulnerable to hardware failure, so if you are going to use a single machine, then ensure it has redundant components. HDDs are particularly prone to failure, so have at least mirrored drives, or a RAID array. PSUs are the next weak point, so redundant PSU is also worthwhile, as is a UPS.

As to clustering, Windows supports service clustering, and manages services using a Network Name, rather than individual Computer names. This allows your client to connect to any machine running the service and not a hard-coded name. But unless you take additional measures, this is Resource failover - directing requests from one instance of the service to another. Converstaion state is usually lost. If your services are writing to a database, then that should also be clustered to also ensure reliabiity and ensure changes are available to the entire cluster, and not just the local node.

This is really just the tip of the iceberg, but I hope it gives you ideas to get started on further research.

Microsoft Clustering Service (MSCS)

撑一把青伞 2024-09-03 19:28:48

如果你分解你想要解决的问题,我想你自己可能会想出一些答案。正如贾斯汀在评论中提到的,没有一个答案。这完全取决于您的服务的用途以及客户如何使用它。您也没有指定有关客户端-服务器交互的任何详细信息。 HTTP? TCP? UDP?其他?

以下是一些需要考虑的事项,以帮助您开始。

1)如果服务或服务器宕机了,你该怎么办?

  • 在不同的服务器上运行多个服务实例怎么样?

2)好的,但是现在客户如何知道多种服务呢?

  • 您可以将列表硬编码到每个客户端中(不推荐)
  • 您可以使用 DNS 循环在所有客户端之间反弹请求。
  • 您可以使用负载平衡设备。
  • 您可以拥有一个单独的服务,该服务了解所有其他服务并可以引导客户使用可用的服务。

3)如果一项服务出现故障怎么办?

  • 如果客户端应用程序连接的服务出现故障,客户端应用程序是否知道该怎么办?如果没有,则需要更新它们以处理这种情况。

这应该可以帮助您了解如何开始使用高可用性的基本概念。如果您提供有关您的架构的具体细节,您可能会得到更好的回应。

If you break down the problems you are trying to solve, I think you'll probably come up with a few answers yourself. As Justin mentioned in the comment, there is no one answer. It completely depends on what your service does and how clients use it. You also don't specify any details about the client-server interactivity. HTTP? TCP? UDP? Other?

Here are some things to think about to get you started.

1) What do you do if the service or server goes down?

  • How about run more than one instance of your service on separate servers?

2) Ok, but now how do the clients know about the multiple services?

  • You can hard code the list into each client(not recommended)
  • You can use DNS round-robin to bounce requests across all of them.
  • You can use a load-balancing device.
  • You can have a separate service that knows about all of the other services and can direct clients to available services.

3) So what if one service goes down?

  • Do the client applications know what to do if the service they are connected to goes down? If not, then they need to be updated to handle that situation.

That should get you started with the basic idea of how to get started with high-availability. If you provide specific details about your architecture, you will probably get a much better response.

月野兔 2024-09-03 19:28:48

如果服务没有公开任何用于客户端连接的接口,您可以:

  • 广播或公开“我还活着”消息或向数据库/注册表/tcp/任何您还活着的信号发出信号

  • 有第二个服务(监视器)来检查这些“我还活着”信号并尝试在以下位置重新启动该服务如果它已关闭,则

但是如果您有一个客户端通过命名管道连接到该服务/ tcp/etc,客户端必须检查数据库中运行的服务的机器地址,或者使用更高级的东西(例如智能交换机)来重定向流量。

If the service doesn’t expose any interface for client connectivity you could:

  • Broadcast or expose an “I’m alive” message or signal a database/registry/tcp/whatever that you are alive

  • Have a second service (monitor) that checks for these “I’m alive” signals and try to restart the service in case it is down

But if you have a client connecting to this service through namedpipes/tcp/etc, the client would have to check the address of the machine with the service running in a database, or have something fancier like an intelligent switch to redirect traffic.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文