云运行在容器实例上发送sigterm,没有可见的比例

发布于 2025-02-04 01:39:01 字数 2486 浏览 3 评论 0原文

我已经在使用Gunicorn + Uvicorn工人的云运行中部署了Python FastApi应用程序。

云运行配置:

“云运行配置”

dockerfile


FROM python:3.8-slim

# Allow statements and log messages to immediately appear in the Knative logs
ENV PYTHONUNBUFFERED True

ENV PORT ${PORT}

ENV APP_HOME /app

ENV APP_MODULE myapp.main:app

ENV TIMEOUT 0

ENV WORKERS 4

WORKDIR $APP_HOME

COPY ./requirements.txt ./

# Install production dependencies.
RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt

# Copy local code to the container image.
COPY . ./

# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
# Timeout is set to 0 to disable the timeouts of the workers to allow Cloud Run to handle instance scaling.

CMD exec gunicorn --bind :$PORT --workers $WORKERS --worker-class uvicorn.workers.UvicornWorker --timeout $TIMEOUT $APP_MODULE  --preload

我的应用程序会收到一个请求,并执行以下操作:

  • 使用firestore.asyncclient进行async调用cloud-firestore
  • 。使用Google Or-tools运行算法。我已经使用CPROFILER来检查该任务平均需要< 500毫秒来完成。
  • 添加了一个FastApi异步背景任务,以写入BigQuery。这是如下:
from fastapi.concurrency import run_in_threadpool

async def bg_task():
    # create json payload
    errors = await run_in_threadpool(lambda: client.insert_rows_json(table_id, rows_to_insert))  # Make an API request.

我一直注意到间歇性处理信号:术语日志,导致枪支关闭过程并重新启动它们。我无法满足为什么会发生这种情况。令人惊讶的是,当API收到0请求时,有时会在非高峰时段发生这种情况。似乎也没有明显的缩小云运行实例来引起此问题。

问题是,这在高峰时段的生产负载期间也经常发生,甚至导致云从2到3/4实例延伸到Autoscale。这为我的API增加了冷的开始时间。我的API平均每分钟收到1个reqs。

随机sigterm

我的API在此期间尚未收到任何请求,并且Cloud Run没有业务杀戮和重新启动Gunicorn进程。

另一个令人震惊的问题是,这似乎只发生在我的生产环境中。在我的开发环境中,我的设置完全相同,但我在那里看不到任何这些问题。

为什么云运行发送sigterm,我如何避免它?

I've deployed a Python FastAPI application on Cloud Run using Gunicorn + Uvicorn workers.

Cloud Run configuration:

Cloud Run configuration

Dockerfile


FROM python:3.8-slim

# Allow statements and log messages to immediately appear in the Knative logs
ENV PYTHONUNBUFFERED True

ENV PORT ${PORT}

ENV APP_HOME /app

ENV APP_MODULE myapp.main:app

ENV TIMEOUT 0

ENV WORKERS 4

WORKDIR $APP_HOME

COPY ./requirements.txt ./

# Install production dependencies.
RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt

# Copy local code to the container image.
COPY . ./

# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
# Timeout is set to 0 to disable the timeouts of the workers to allow Cloud Run to handle instance scaling.

CMD exec gunicorn --bind :$PORT --workers $WORKERS --worker-class uvicorn.workers.UvicornWorker --timeout $TIMEOUT $APP_MODULE  --preload

My application receives a requests and does the following:

  • Makes async call to cloud-firestore using firestore.AsyncClient
  • Runs an algorithm using Google OR-Tools. I've used a Cprofiler to check that this task on average takes < 500 ms to complete.
  • Adds a FastAPI async Background Task to write to BigQuery. This is achieved as follows:
from fastapi.concurrency import run_in_threadpool

async def bg_task():
    # create json payload
    errors = await run_in_threadpool(lambda: client.insert_rows_json(table_id, rows_to_insert))  # Make an API request.

I have been noticing intermittent Handling signal: term logs which causes Gunicorn to shut down processes and restart them. I can't get my head around as to why this might be happening. And the surprising bit is that this happens sometimes at off-peak hours when the API is receiving 0 requests. There doesn't seem to be any apparent scaling down of Cloud Run instances to be causing this issue either.

SIGTERM
RESTART

Issue is, this also happens quite frequently during production load to my API during peak hours - and even causes Cloud Run to autoscale from 2 to 3/4 instances. This adds cold start times to my API. My API receives on average 1 reqs/minute.

Cloud Run metrics during random SIGTERM

enter image description here

As clearly shown here, my API has not been receiving any requests in this period and Cloud Run has no business killing and restarting Gunicorn processes.

Another startling issue is that this seems to only happen in my production environment. In my development environment, I have the exact SAME setup but I don't see any of these issues there.

Why is Cloud Run sending SIGTERM and how do I avoid it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

赤濁 2025-02-11 01:39:01

Cloud Run是一个无服务器平台,这意味着服务器管理是由Google Cloud完成的,它可以选择不断停止某些实例(出于维护原因,出于技术问题原因,...)。

但这对您来说并没有任何变化,当然是一个冷淡的开始。

您可以并行3或4个实例,而不是2个(最小值)?是的,但是可计费的实例是平坦的。云运行又是无服务器,它可以创建备份实例,并确保未来关闭的某些实例不会影响您的流量。这是内部优化。没有增加的费用,它效果很好!

你能避免吗?不,因为它是无服务器的,也是因为对您的工作负载没有影响。


关于“环境”的最后一点。对于Google Cloud,所有项目都是生产项目。没有区别,Google不知道什么是关键,因此一切都是至关重要的。

如果您注意到2个项目之间的区别,则仅仅是因为您的项目已部署在不同的Google Cloud内部集群上。簇之间的状态,性能,维护操作(...)不同。再一次,您不能为此做任何事情。

Cloud Run is a serverless platform, that means the server management is done by Google Cloud and it can choose to stop some instance time to time (for maintenance reason, for technical issue reason,...).

But it changes nothing for you, of course a cold start but it should be invisible for your process, even in high load, because you have a min-instance param to 2 that keep the instance up and ready to serve the traffic without cold start.

Can you have 3 or 4 instances in parallel, instead of 2 (min value)? Yes, but the Billable instance is flat to 2. Cloud Run, again, is serverless, it can create instances to backup and be sure that the future shut down of some won't impact your traffic. It's an internal optimization. No addition cost, it just works well!

Can you avoid that? No, because it's serverless, and also because there no impact on your workloads.


Last point about "environment". For Google Cloud, all the project are production projects. No difference, google can't know what is critical or not, therefore all is critical.

If you note difference between 2 projects it's simply because your projects are deployed on different Google Cloud internal clusters. The status, performances, maintenance operations (...) are different between clusters. And again, you can't do anything for that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文