云运行在容器实例上发送sigterm，没有可见的比例

发布于 2025-02-04 01:39:01 字数 2486 浏览 3 评论 0原文

我已经在使用Gunicorn + Uvicorn工人的云运行中部署了Python FastApi应用程序。

云运行配置：

dockerfile


FROM python:3.8-slim

# Allow statements and log messages to immediately appear in the Knative logs
ENV PYTHONUNBUFFERED True

ENV PORT ${PORT}

ENV APP_HOME /app

ENV APP_MODULE myapp.main:app

ENV TIMEOUT 0

ENV WORKERS 4

WORKDIR $APP_HOME

COPY ./requirements.txt ./

# Install production dependencies.
RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt

# Copy local code to the container image.
COPY . ./

# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
# Timeout is set to 0 to disable the timeouts of the workers to allow Cloud Run to handle instance scaling.

CMD exec gunicorn --bind :$PORT --workers $WORKERS --worker-class uvicorn.workers.UvicornWorker --timeout $TIMEOUT $APP_MODULE  --preload

我的应用程序会收到一个请求，并执行以下操作：

使用firestore.asyncclient进行async调用cloud-firestore
。使用Google Or-tools运行算法。我已经使用CPROFILER来检查该任务平均需要＆lt; 500毫秒来完成。
添加了一个FastApi异步背景任务，以写入BigQuery。这是如下：

from fastapi.concurrency import run_in_threadpool

async def bg_task():
    # create json payload
    errors = await run_in_threadpool(lambda: client.insert_rows_json(table_id, rows_to_insert))  # Make an API request.

我一直注意到间歇性处理信号：术语日志，导致枪支关闭过程并重新启动它们。我无法满足为什么会发生这种情况。令人惊讶的是，当API收到0请求时，有时会在非高峰时段发生这种情况。似乎也没有明显的缩小云运行实例来引起此问题。

问题是，这在高峰时段的生产负载期间也经常发生，甚至导致云从2到3/4实例延伸到Autoscale。这为我的API增加了冷的开始时间。我的API平均每分钟收到1个reqs。

随机sigterm

我的API在此期间尚未收到任何请求，并且Cloud Run没有业务杀戮和重新启动Gunicorn进程。

另一个令人震惊的问题是，这似乎只发生在我的生产环境中。在我的开发环境中，我的设置完全相同，但我在那里看不到任何这些问题。

为什么云运行发送sigterm，我如何避免它？

原文

I've deployed a Python FastAPI application on Cloud Run using Gunicorn + Uvicorn workers.

Cloud Run configuration:

Dockerfile


FROM python:3.8-slim

# Allow statements and log messages to immediately appear in the Knative logs
ENV PYTHONUNBUFFERED True

ENV PORT ${PORT}

ENV APP_HOME /app

ENV APP_MODULE myapp.main:app

ENV TIMEOUT 0

ENV WORKERS 4

WORKDIR $APP_HOME

COPY ./requirements.txt ./

# Install production dependencies.
RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt

# Copy local code to the container image.
COPY . ./

# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
# Timeout is set to 0 to disable the timeouts of the workers to allow Cloud Run to handle instance scaling.

CMD exec gunicorn --bind :$PORT --workers $WORKERS --worker-class uvicorn.workers.UvicornWorker --timeout $TIMEOUT $APP_MODULE  --preload

My application receives a requests and does the following:

Makes async call to cloud-firestore using firestore.AsyncClient
Runs an algorithm using Google OR-Tools. I've used a Cprofiler to check that this task on average takes < 500 ms to complete.
Adds a FastAPI async Background Task to write to BigQuery. This is achieved as follows:

from fastapi.concurrency import run_in_threadpool

async def bg_task():
    # create json payload
    errors = await run_in_threadpool(lambda: client.insert_rows_json(table_id, rows_to_insert))  # Make an API request.

I have been noticing intermittent Handling signal: term logs which causes Gunicorn to shut down processes and restart them. I can't get my head around as to why this might be happening. And the surprising bit is that this happens sometimes at off-peak hours when the API is receiving 0 requests. There doesn't seem to be any apparent scaling down of Cloud Run instances to be causing this issue either.

Issue is, this also happens quite frequently during production load to my API during peak hours - and even causes Cloud Run to autoscale from 2 to 3/4 instances. This adds cold start times to my API. My API receives on average 1 reqs/minute.

Cloud Run metrics during random SIGTERM

As clearly shown here, my API has not been receiving any requests in this period and Cloud Run has no business killing and restarting Gunicorn processes.

Another startling issue is that this seems to only happen in my production environment. In my development environment, I have the exact SAME setup but I don't see any of these issues there.

Why is Cloud Run sending SIGTERM and how do I avoid it?

分享到QQ

分享到微博