ECS / EC2自动缩放并不是一个接一个地处理两个任务

发布于 2025-01-30 11:40:18 字数 3137 浏览 3 评论 0原文

我目前的智慧结束了，试图解决这个问题。

我们有一个步骤功能管道，该管道可以在Fargate和EC2 ECS实例的混合物上运行任务。他们都处于同一集群中。

如果我们运行一个需要EC2的任务，并且之后要运行另一个任务，该任务也使用EC2，我们必须放置20分钟的Wait命令，以使第二个任务成功运行。

它似乎不使用现有的EC2实例，或者在运行第二个任务时再扩展更多？它给出了资源的错误：内存。我希望它可以扩展更多的EC2实例，以符合需求，或者使用现有的EC2实例来运行任务。

ECS集群具有一个容量提供商，其量身限度扩大，托管终止保护和目标容量为100％。

ASG的最小容量为0，最大容量为8。它已管理扩展。实例类型是R5.4XLARGE

示例步骤函数，它重新系列问题：

{
  "StartAt": "Set up variables",
  "States": {
    "Set up variables": {
      "Type": "Pass",
      "Next": "Map1",
      "Result": [
        1,
        2,
        3
      ],
      "ResultPath": "$.input"
    },
    "Map1": {
      "Type": "Map",
      "Next": "Map2",
      "ItemsPath": "$.input",
      "ResultPath": null,
      "Iterator": {
        "StartAt": "Inner1",
        "States": {
          "Inner1": {
            "ResultPath": null,
            "Type": "Task",
            "TimeoutSeconds": 2000,
            "End": true,
            "Resource": "arn:aws:states:::ecs:runTask.sync",
            "Parameters": {
              "Cluster": "arn:aws:ecs:CLUSTER_ID",
              "TaskDefinition": "processing-task",
              "NetworkConfiguration": {
                "AwsvpcConfiguration": {
                  "Subnets": [
                    "subnet-111"
                  ]
                }
              },
              "Overrides": {
                "Memory": "110000",
                "Cpu": "4096",
                "ContainerOverrides": [
                  {
                    "Command": [
                      "sh",
                      "-c",
                      "sleep 600"
                    ],
                    "Name": "processing-task"
                  }
                ]
              }
            }
          }
        }
      }
    },
    "Map2": {
      "Type": "Map",
      "End": true,
      "ItemsPath": "$.input",
      "Iterator": {
        "StartAt": "Inner2",
        "States": {
          "Inner2": {
            "ResultPath": null,
            "Type": "Task",
            "TimeoutSeconds": 2000,
            "End": true,
            "Resource": "arn:aws:states:::ecs:runTask.sync",
            "Parameters": {
              "Cluster": "arn:aws:ecs:CLUSTER_ID",
              "TaskDefinition": "processing-task",
              "NetworkConfiguration": {
                "AwsvpcConfiguration": {
                  "Subnets": [
                    "subnet-111"
                  ]
                }
              },
              "Overrides": {
                "Memory": "110000",
                "Cpu": "4096",
                "ContainerOverrides": [
                  {
                    "Command": [
                      "sh",
                      "-c",
                      "sleep 600"
                    ],
                    "Name": "processing-task"
                  }
                ]
              }
            }
          }
        }
      }
    }
  }
}

我到目前为止尝试过的是：

我尝试在EC2实例中更改冷却时间，并以少量的成功。唯一的问题是，现在它的扩展太快了，我们仍然必须等待之前等待更多任务，只有我们必须等待更短的时间。

请让我知道我们想要什么，如果是可能的话谢谢

原文

I'm currently at my wits end trying to figure this out.

We have a step functions pipeline that runs tasks on a mixture of Fargate and EC2 ECS instances. They are all in the same cluster.

If we run a task that requires EC2, and we want to run another task afterwards that also uses EC2 we have to put a 20 minute Wait command in order for the second task to run successfully.

It doesn't seem to use existing EC2 instances, or scale up any more for when we run the second task? It gives the error of RESOURCE:MEMORY. I would expect it to scale up some more EC2 instances in order to match the demand, or to use the existing EC2 instances to run the tasks.

The ECS cluster has a capacity provider with managed scaling on, managed termination protection on and target capacity at 100%.

The ASG has a min capacity of 0, and a max capacity of 8.
It has managed scaling on.
Instance type is r5.4xlarge

Example step function that recreates the problem:

{
  "StartAt": "Set up variables",
  "States": {
    "Set up variables": {
      "Type": "Pass",
      "Next": "Map1",
      "Result": [
        1,
        2,
        3
      ],
      "ResultPath": "$.input"
    },
    "Map1": {
      "Type": "Map",
      "Next": "Map2",
      "ItemsPath": "$.input",
      "ResultPath": null,
      "Iterator": {
        "StartAt": "Inner1",
        "States": {
          "Inner1": {
            "ResultPath": null,
            "Type": "Task",
            "TimeoutSeconds": 2000,
            "End": true,
            "Resource": "arn:aws:states:::ecs:runTask.sync",
            "Parameters": {
              "Cluster": "arn:aws:ecs:CLUSTER_ID",
              "TaskDefinition": "processing-task",
              "NetworkConfiguration": {
                "AwsvpcConfiguration": {
                  "Subnets": [
                    "subnet-111"
                  ]
                }
              },
              "Overrides": {
                "Memory": "110000",
                "Cpu": "4096",
                "ContainerOverrides": [
                  {
                    "Command": [
                      "sh",
                      "-c",
                      "sleep 600"
                    ],
                    "Name": "processing-task"
                  }
                ]
              }
            }
          }
        }
      }
    },
    "Map2": {
      "Type": "Map",
      "End": true,
      "ItemsPath": "$.input",
      "Iterator": {
        "StartAt": "Inner2",
        "States": {
          "Inner2": {
            "ResultPath": null,
            "Type": "Task",
            "TimeoutSeconds": 2000,
            "End": true,
            "Resource": "arn:aws:states:::ecs:runTask.sync",
            "Parameters": {
              "Cluster": "arn:aws:ecs:CLUSTER_ID",
              "TaskDefinition": "processing-task",
              "NetworkConfiguration": {
                "AwsvpcConfiguration": {
                  "Subnets": [
                    "subnet-111"
                  ]
                }
              },
              "Overrides": {
                "Memory": "110000",
                "Cpu": "4096",
                "ContainerOverrides": [
                  {
                    "Command": [
                      "sh",
                      "-c",
                      "sleep 600"
                    ],
                    "Name": "processing-task"
                  }
                ]
              }
            }
          }
        }
      }
    }
  }
}

What I've tried so far:

I've tried changing the cooldown period for the EC2 instances, with a small amount of success. The only problem is that it now scales up too fast and we still have to wait before running more tasks, only we have to wait a shorter time.

Please let me know if what we want is possible and how to do it if it is
Thank you

分享到QQ

分享到微博