如何终止特定的 Azure 辅助角色实例

发布于 2024-10-23 19:51:11 字数 708 浏览 2 评论 0原文

背景

我正在尝试找出 Azure 应用程序的最佳结构。我的每个工作角色都会启动多个长期运行的作业。随着时间的推移,我可以将作业从一个实例转移到另一个实例,方法是在源实例上将作业切换到只读模式,在目标实例上旋转它们,然后在源实例上将原始作业向下旋转。

如果我有太多作业,那么我可以告诉 Azure 启动额外的角色实例,并将它们用于新作业。相反,如果我的负载下降(例如在夜间),那么我可以将未完成的作业合并到几台机器上,并告诉 Azure 提供更少的实例。

问题在于(据我所知)Azure 没有提供任何机制让我决定停止哪个实例。因此,我不知道要整合到哪些服务器上,并且当实例停止时,我的一些作业将会终止,从而在我在幸存的实例上重新启动这些作业时导致用户延迟。

想法 1:我决定停止哪个实例,并从其 Run() 返回。然后,我告诉 Azure 将我的实例计数减少 1,并希望它得出的结论是损坏的实例是一个不错的候选实例。有人尝试过这样的事情吗?

想法2:我预定义了一大堆不同的工人角色,具有相同的内容。我可以通过将实例计数从零切换到一,然后再切换回来来单独停止和启动它们。我认为这个想法可行,但我不喜欢它,因为它似乎违背了 Azure 的自然做事方式,而且因为它让我需要进行大量额外的簿记工作来管理额外的辅助角色。

想法 3:接受它。

还有更好的想法吗?

Background

I am trying to work out the best structure for an Azure application. Each of my worker roles will spin up multiple long-running jobs. Over time I can transfer jobs from one instance to another by switching them to a readonly mode on the source instance, spinning them up on the target instance, and then spinning the original down on the source instance.

If I have too many jobs then I can tell Azure to spin up extra role instance, and use them for new jobs. Conversely if my load drops (e.g. during the night) then I can consolidate outstanding jobs to a few machines and tell Azure to give me fewer instances.

The trouble is that (as I understand it) Azure provides no mechanism to allow me to decide which instance to stop. Thus I cannot know which servers to consolidate onto, and some of my jobs will die when their instance stops, causing delays for users while I restart those jobs on surviving instances.

Idea 1: I decide which instance to stop, and return from its Run(). I then tell Azure to reduce my instance count by one, and hope it concludes that the broken instance is a good candidate. Has anyone tried anything like this?

Idea 2: I predefine a whole bunch of different worker roles, with identical contents. I can individually stop and start them by switching their instance count from zero to one, and back again. I think this idea would work, but I don't like it because it seems to go against the natural Azure way of doing things, and because it involves me in a lot of extra bookkeeping to manage the extra worker roles.

Idea 3: Live with it.

Any better ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

冰葑 2024-10-30 19:51:11

针对您的想法

想法1:我没有尝试完全按照您所描述的方式进行操作,但根据我的经验,您的第一个实例的名称以_0结尾,下一个_1,我是当然你可以猜到剩下的。当您减少实例计数时,它会删除具有最高编号后缀的实例。如果它考虑到任何特定实例的状态,我会感到惊讶。

想法 2:正如我认为您所暗示的那样,这会产生管理问题。每个托管服务只能有 5 个不同的工作人员,因此您需要为您希望能够扩展到的每组 5 个角色提供一个服务。此外,当您部署更新时,您必须上传 X 倍的服务,其中 X 是您当前支持的最大实例数。

想法 3:技术上最简单。在等待一些澄清之前,这可能就是我现在要做的事情。为了减少此选项的缺点,可能需要研究更快地加载数据的方法。通常有一个金发姑娘级别(不是太多,也不是太少)的并行性可以帮助解决这个问题。

In response to your ideas

Idea 1: I haven't tried doing exactly what you're describing, but in my experience your first instance has a name that ends with _0, the next _1 and I'm sure you can guess the rest. When you decrease the instance count it drops off the instance with the highest number suffix. I would be surprised if it took into account the state of any particular instance.

Idea 2: As I think you hint at, this will create management problems. You can only have 5 different workers per hosted service, so you'll need a service for each group of 5 roles that you want to be able to scale to. Also when you deploy updates you'll have to upload X times more services where X is the maximum number of instances you currently support.

Idea 3: Technically the easiest. Pending some clarification, this is probably what I'd be doing for now. To reduce the downsides of this option it may pay to investigate ways of loading the data faster. There is usually a Goldilocks level (not too much, not too little) of parallelism that helps with this.

缪败 2024-10-30 19:51:11

你是对的 - 你无法选择停止哪个实例。一般来说,您会在每个辅助角色实例上运行相同的作业,其中每个实例监视相同的队列(或者可能是多个线程或作业监视多个队列)。

如果您确实需要在一个实例(例如调度程序)上运行一项作业,请考虑使用 blob 租约作为限制这一点的方法。创建一个 blob 作为互斥锁。然后,当每个实例启动时,调度程序作业会尝试获取该 blob 的写入租约。如果成功,它就会运行。如果失败,它只会休眠(可能一分钟)并重试。在未来的某个时刻,当您减少实例数量时,假设运行调度程序的实例被终止。一分钟后(或您选择的任何时间跨度),另一个实例尝试获取租约,成功,现在运行调度程序代码。

You're right - you cannot choose which instance to stop. In general, you'd run the same jobs on each worker role instance, where each instance watches the same queue (or maybe multiple threads or jobs watching multiple queues).

If you really need to run a job on one instance (such as a scheduler), consider using blob leases as the way to constrain this. Create a blob as a mutex. Then, as each instance spins up, the scheduler job attempts to obtain a write lease on that blob. If it succeeds, it runs. If it fails, it simply sleeps (maybe for a minute) and tries again. At some point in the future, as you scale down in instance count, let's say the instance running the scheduler is killed. A minute later (or whatever time span you choose), another instance tries to acquire the lease, succeeds, and now runs the scheduler code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文