CI/CD、Terraform 和 AWS ECS:使用 Lambda 应用数据库迁移?

发布于 2025-01-09 01:28:30 字数 1377 浏览 1 评论 0原文

我有一个由多个服务组成的应用程序,每个服务都有自己的 postgres 数据库。我想将其部署到AWS。 Kube 对我来说太复杂了,所以我决定使用 AWS ECS 来提供服务 + AWS RDS 来提供数据库。并使用 Terraform 部署一切。

我设置了一个 CI/CD 管道,它在合并到暂存分支后,构建、测试应用程序并将其部署到相应的环境。部署基本上包括构建 docker 映像并将其推送到 AWS ECR,然后调用 terraform plan/apply 。

Terraform 创建/更新 VPC、子网、带有任务的 ECS 服务、RDS 实例等。

这是有效的。

但我不确定如何应用数据库迁移。

我有一个单独的控制台应用程序,其唯一目的是应用迁移然后退出。所以我可以在应用 terraform 之前或之后在 CI/CD 管道中运行它。但是,before 不起作用,因为如果这是第一次部署,那么数据库还不存在,after 不起作用,因为我想首先应用迁移然后然后启动服务,而不是相反。

因此,我需要某种方法来在 terraform 部署过程中运行此迁移器控制台应用程序 - 在 rds 之后但在 ecs 之前。

我读了 Andrew Lock 的一篇文章,他通过在 Kubernetes 中使用作业和初始化容器解决了这个问题。但我没有使用 Kube,所以这对我来说不是一个选择。

我在 AWS ECS 文档中看到,您可以运行独立任务(一次性任务),这基本上就是我所需要的,并且您可以使用 AWS CLI 运行它们,但是虽然我可以使用来自管道,我不能在 terraform 执行其操作的过程中使用它。我不能只是对 terraform 说“在创建此资源之后但在该资源之前运行一些随机命令”。

然后我想到了使用AWS Lambda。 Terraform 中有一种名为 aws_lambda_invocau 的数据源类型,它的功能正如其名称所示。因此,现在我正在考虑在管道的构建阶段构建迁移器的 docker 映像,将其推送到 AWS ECR,然后在 terraform 中从映像和 aws_lambda_in Vocation 创建一个 aws_lambda_function 资源。 /code> 调用该函数的数据源。让ECS依赖于调用,应该可以吧?

这样做有一个问题:在规划和应用时都会查询数据源,但我只希望迁移器 lambda 在应用时运行。我认为可以通过使用 count 属性和调用数据源中的一些自定义变量来解决。

我认为这种方法可能有效,但肯定有更好、更简单的方法吗?有什么建议吗?

注意:我无法从服务本身应用迁移,因为每个服务都有多个实例,因此有可能两个服务尝试同时将迁移应用到同一个数据库,这会导致严重后果。

如果您想知道,我使用 .NET 5 和 GitLab,但我认为这与问题无关。

I have an app consisting of multiple services, each with its own postgres database. I want to deploy it to AWS. Kube is too complicated for me, so I decided to use AWS ECS for services + AWS RDS for DBs. And deploy everything using Terraform.

I have a CI/CD pipeline set up, which upon a merge to the staging branch, builds, tests, and deploys the app to the corresponding environment. Deploying basically consists of building and pushing docker images to AWS ECR and then calling terraform plan/apply.

Terraform creates/updates VPC, subnets, ECS services with tasks, RDS instances, etc.

This works.

But I'm not sure how to apply db migrations.

I have a separate console app whose only purpose is to apply migrations and then quit. So I can just run it in the CI/CD pipeline before or after applying terraform. However, before doesn't work because if it's the very first deployment then the databases wouldn't exist yet, and after doesn't work because I want to first apply migrations and then start services, not the other way around.

So I need some way to run this migrator console app in the middle of terraform deployment – after rds but before ecs.

I read an article by Andrew Lock where he solves this exact problem by using jobs and init containers in Kubernetes. But I'm not using Kube, so that's not an option for me.

I see in AWS ECS docs that you can run standalone tasks (one-off tasks), which is basically what I need, and you can run them with AWS CLI, but whilst I can use the cli from the pipeline, I can't use it in the middle of terraform doing its thing. I can't just say to terraform "run some random command after creating this resource, but before that one".

Then I thought about using AWS Lambda. There is a data source type in Terraform called aws_lambda_invocation, which does exactly what it says in the name. So now I'm thinking about building a docker image of migrator in the build stage of the pipeline, pushing it to AWS ECR, then in terraform creating an aws_lambda_function resource from the image and aws_lambda_invocation data source invoking the function. Make ECS depend on the invocation, and it should work, right?

There is one problem with this: data sources are queried both when planning and applying, but I only want the migrator lambda to run when applying. I think it could be solved by using count attribute and some custom variable in the invocation data source.

I think this approach might work, but surely there must be a better, less convoluted way of doing it? Any recommendations?

Note: I can't apply migrations from the services themselves, because I have more than one instance of each, so there is a possibility of two services trying to apply migrations to the same db at the same time, which would end badly.

If you are wondering, I use .NET 5 and GitLab, but I think it's not relevant for the question.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

池予 2025-01-16 01:28:30

好吧,如果您想知道,我在问题帖子中描述的 lambda 解决方案是有效的。虽然不是很方便,但是很有效。在 terraform 中,您首先需要创建一个连接到数据库所在的 vpc 的函数,将所有必要的条目添加到 db sg(用于入口)和 lambda sg(用于出口),然后像这样调用它(这里我将连接字符串传递为参数):

data "aws_lambda_invocation" "migrator" {
  count         = var.apply_migrations == "yes" ? 1 : 0
  function_name = aws_lambda_function.migrator.function_name
  input         = <<JSON
"Host=${aws_db_instance.service_a.address};Port=${aws_db_instance.service_a.port};Database=${aws_db_instance.service_a.db_name};Username=${aws_db_instance.service_a.username};Password=${aws_db_instance.service_a.password};"
JSON
}

默认设置 apply_migration = "no"。那么你只需要在应用时指定它 - terraform apply -var apply_migrations=yes

然后只需使 aws_ecs_service (或用于部署应用程序的任何内容)依赖于调用即可。

该解决方案的最大问题是运行 terraform destroy 需要很长时间。这是因为为了将 lambda 连接到 vpc,AWS 会自动为其创建一个网络接口(因此它不受 terraform 管理)。当 destroy 销毁 lambda 时,接口在销毁后会在一段时间内保持“使用中”状态(具体情况各不相同 - 需要 10 分钟或更长时间 - 而且您甚至无法手动删除它)。这导致terraform无法删除接口使用的子网,从而导致terraform长时间挂起。

但这并不重要,因为我找到了一个更好的解决方案,它需要更多的设置,但工作完美。

事实证明,terraform 可以运行任意命令。有一个可用的 docker 提供商,您基本上可以启动任何您想要的容器来执行任何您想要的操作。

terraform {
  # ...

  required_providers {
    # ...

    docker = {
      source  = "kreuzwerker/docker"
      version = "2.16.0"
    }
  }
}

# this setup works for gitlab ci/cd with docker-in-docker
provider "docker" {
  host = "tcp://docker:2376"

  ca_material   = file("/certs/client/ca.pem")
  cert_material = file("/certs/client/cert.pem")
  key_material  = file("/certs/client/key.pem")

  registry_auth {
    address  = var.image_registry_uri
    # username and password are passed via DOCKER_REGISTRY_USER and DOCKER_REGISTRY_PASS env vars
  }
}

data "docker_registry_image" "migrator" {
  name = var.migrator_image_uri
}

resource "docker_image" "migrator" {
  name          = data.docker_registry_image.migrator.name
  pull_triggers = [data.docker_registry_image.migrator.sha256_digest]
}

resource "docker_container" "migrator" {
  name     = "migrator"
  image    = docker_image.migrator.repo_digest
  attach   = true # terraform will wait for container to finish before proceeding
  must_run = false # it's a one-time job container, not a daemon
  env = [
    "BASTION_PRIVATE_KEY=${var.bastion_private_key}",
    "BASTION_HOST=${aws_instance.bastion.public_ip}",
    "BASTION_USER=ec2-user",
    "DATABASE_HOST=${aws_db_instance.service_a.address}",
    "DATABASE_PORT=${aws_db_instance.service_a.port}",
    "DATABASE_NAME=${aws_db_instance.service_a.db_name}",
    "DATABASE_USER=${aws_db_instance.service_a.username}",
    "DATABASE_PASSWORD=${aws_db_instance.service_a.password}"
  ]
}

如您所见,您需要一个堡垒实例设置,但无论如何您可能都需要它。然后在迁移器程序中,您需要使用 ssh 隧道连接到数据库。应该不是问题,ssh 包适用于每种语言。这是 .NET Core 示例:

using var stream = new MemoryStream();
using var writer = new StreamWriter(stream);
writer.Write(Environment.GetEnvironmentVariable("BASTION_PRIVATE_KEY"));
writer.Flush();
stream.Position = 0;

using var keyFile = new PrivateKeyFile(stream);

using var client = new SshClient(
    Environment.GetEnvironmentVariable("BASTION_HOST"),
    Environment.GetEnvironmentVariable("BASTION_USER"),
    keyFile
);

client.Connect();

var localhost = "127.0.0.1";
uint localPort = 5432;

var dbHost = Environment.GetEnvironmentVariable("DATABASE_HOST");
var dbPort = uint.Parse(Environment.GetEnvironmentVariable("DATABASE_PORT"));
var dbName = Environment.GetEnvironmentVariable("DATABASE_NAME");
var dbUser = Environment.GetEnvironmentVariable("DATABASE_USER");
var dbPassword = Environment.GetEnvironmentVariable("DATABASE_PASSWORD");

using var tunnel = new ForwardedPortLocal(localhost, localPort, dbHost, dbPort);
client.AddForwardedPort(tunnel);

tunnel.Start();

var dbConnectionString = $"Host={localhost};Port={localPort};Database={dbName};Username={dbUser};Password={dbPassword};";

var host = ServiceA.Api.Program
    .CreateHostBuilder(args: new[] { "ConnectionStrings:ServiceA=" + dbConnectionString })
    .Build();

using (var scope = host.Services.CreateScope()) {
    var dbContext = scope
        .ServiceProvider
        .GetRequiredService<ServiceADbContext>();

    dbContext.Database.Migrate();
}

tunnel.Stop();
client.Disconnect();

在 gitlab ci/cd 中,terraform 作业使用:

image:
  name: hashicorp/terraform:1.1.6
  entrypoint:
    - "/usr/bin/env"
    - "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

services:
  - docker:19.03.12-dind

variables:
  DOCKER_TLS_CERTDIR: "/certs"
  DOCKER_REGISTRY_USER: "AWS"
  # set DOCKER_REGISTRY_PASS after authenticating to the registry

Well, in case you are wondering, the lambda solution that I described in the question post is valid. It's not super convenient, but it works. In terraform you first need to create a function connected to a vpc in which your database lives, add all the necessary entries to the db sg for ingress and lambda sg for egress, and then call it smth like this (here I pass connection string as an argument):

data "aws_lambda_invocation" "migrator" {
  count         = var.apply_migrations == "yes" ? 1 : 0
  function_name = aws_lambda_function.migrator.function_name
  input         = <<JSON
"Host=${aws_db_instance.service_a.address};Port=${aws_db_instance.service_a.port};Database=${aws_db_instance.service_a.db_name};Username=${aws_db_instance.service_a.username};Password=${aws_db_instance.service_a.password};"
JSON
}

Make apply_migration = "no" by default. Then you would only need to specify it when applying – terraform apply -var apply_migrations=yes.

Then just make aws_ecs_service (or whatever you use to deploy your application) to depend on the invocation.

The biggest problem with this solution is that running terraform destroy takes a very long time. This is because to connect the lambda to the vpc, AWS creates a network interface for it automatically (so it is not managed by terraform). When destroy destroys the lambda, the interface stays in the "In Use" state for some time after destruction (it varies – takes 10 min or more – and you can't even delete it manually). That leads to terraform being unable to delete the subnet used by the interface, which leads to terraform hanging for a long time.

But it doesn't really matter, because I found a much better solution, which takes more setup, but works flawlessly.

It turns out that terraform can run arbitrary commands. There is a docker provider available for it, and you can basically spin up any container you want to do whatever you want.

terraform {
  # ...

  required_providers {
    # ...

    docker = {
      source  = "kreuzwerker/docker"
      version = "2.16.0"
    }
  }
}

# this setup works for gitlab ci/cd with docker-in-docker
provider "docker" {
  host = "tcp://docker:2376"

  ca_material   = file("/certs/client/ca.pem")
  cert_material = file("/certs/client/cert.pem")
  key_material  = file("/certs/client/key.pem")

  registry_auth {
    address  = var.image_registry_uri
    # username and password are passed via DOCKER_REGISTRY_USER and DOCKER_REGISTRY_PASS env vars
  }
}

data "docker_registry_image" "migrator" {
  name = var.migrator_image_uri
}

resource "docker_image" "migrator" {
  name          = data.docker_registry_image.migrator.name
  pull_triggers = [data.docker_registry_image.migrator.sha256_digest]
}

resource "docker_container" "migrator" {
  name     = "migrator"
  image    = docker_image.migrator.repo_digest
  attach   = true # terraform will wait for container to finish before proceeding
  must_run = false # it's a one-time job container, not a daemon
  env = [
    "BASTION_PRIVATE_KEY=${var.bastion_private_key}",
    "BASTION_HOST=${aws_instance.bastion.public_ip}",
    "BASTION_USER=ec2-user",
    "DATABASE_HOST=${aws_db_instance.service_a.address}",
    "DATABASE_PORT=${aws_db_instance.service_a.port}",
    "DATABASE_NAME=${aws_db_instance.service_a.db_name}",
    "DATABASE_USER=${aws_db_instance.service_a.username}",
    "DATABASE_PASSWORD=${aws_db_instance.service_a.password}"
  ]
}

As you can see, you need a bastion instance setup, but you would probably need it anyway. Then in the migrator program you need to use an ssh tunnel to connect to the db. Shouldn't be a problem, ssh packages are available for every language. Here's .NET Core example:

using var stream = new MemoryStream();
using var writer = new StreamWriter(stream);
writer.Write(Environment.GetEnvironmentVariable("BASTION_PRIVATE_KEY"));
writer.Flush();
stream.Position = 0;

using var keyFile = new PrivateKeyFile(stream);

using var client = new SshClient(
    Environment.GetEnvironmentVariable("BASTION_HOST"),
    Environment.GetEnvironmentVariable("BASTION_USER"),
    keyFile
);

client.Connect();

var localhost = "127.0.0.1";
uint localPort = 5432;

var dbHost = Environment.GetEnvironmentVariable("DATABASE_HOST");
var dbPort = uint.Parse(Environment.GetEnvironmentVariable("DATABASE_PORT"));
var dbName = Environment.GetEnvironmentVariable("DATABASE_NAME");
var dbUser = Environment.GetEnvironmentVariable("DATABASE_USER");
var dbPassword = Environment.GetEnvironmentVariable("DATABASE_PASSWORD");

using var tunnel = new ForwardedPortLocal(localhost, localPort, dbHost, dbPort);
client.AddForwardedPort(tunnel);

tunnel.Start();

var dbConnectionString = 
quot;Host={localhost};Port={localPort};Database={dbName};Username={dbUser};Password={dbPassword};";

var host = ServiceA.Api.Program
    .CreateHostBuilder(args: new[] { "ConnectionStrings:ServiceA=" + dbConnectionString })
    .Build();

using (var scope = host.Services.CreateScope()) {
    var dbContext = scope
        .ServiceProvider
        .GetRequiredService<ServiceADbContext>();

    dbContext.Database.Migrate();
}

tunnel.Stop();
client.Disconnect();

In gitlab ci/cd, terraform jobs use:

image:
  name: hashicorp/terraform:1.1.6
  entrypoint:
    - "/usr/bin/env"
    - "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"

services:
  - docker:19.03.12-dind

variables:
  DOCKER_TLS_CERTDIR: "/certs"
  DOCKER_REGISTRY_USER: "AWS"
  # set DOCKER_REGISTRY_PASS after authenticating to the registry

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文