15分钟后使用SQLalchemy+ Docker Swarm时连接超时

发布于 2025-02-11 02:23:18 字数 1708 浏览 1 评论 0 原文

我有一个fastapi+sqlalchemy+mariadb应用程序,该应用程序在运行本地或docker compose docker compose 中时工作正常。但是,当我以Swarm模式运行它( Docker stack Deplotion -C Docker -Compose.yml issuetest )时,它会在固定15分钟的空闲之后产生连接错误:

sqlalchemy.exc.OperationalError: (asyncmy.errors.OperationalError) (2013, 'Lost connection to MySQL server during query ([Errno 104] Connection reset by peer)')

默认的MariadB超时应为8小时。我可以通过定义 pool_recycle = 60*10 (或任何其他值小于15分钟)来避免此问题,但想了解出了什么问题。

的简约代码样本:

import uvicorn
from fastapi import FastAPI
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlmodel import Field, SQLModel, select

engine = create_async_engine("mysql+asyncmy://root:pw@mariadbhost/somedb", future=True)
app = FastAPI()


class Car(SQLModel, table=True):
    id: int = Field(nullable=True, primary_key=True)
    name: str


@app.on_event("startup")
async def on_startup():
    async with engine.begin() as conn:
        await conn.run_sync(SQLModel.metadata.create_all)


async def get_db_cars():
    async with AsyncSession(engine) as session:
        statement = select(Car)
        result = await session.execute(statement)
        cars = result.scalars().all()
    return cars


@app.get("/dbcall")
async def dbcall():
    return await get_db_cars()


if __name__ == "__main__":
    uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=True)

要复制,这里是App/main.py和Docker-compose.yml文件

version: '3.1'

services:
  mariadbhost:
    image: mariadb:10.7
    environment:
      MYSQL_ROOT_PASSWORD: pw
      MYSQL_DATABASE: somedb

  mybackend:
    image: myimage
    ports:
      - 8089:80

I have a FastAPI+SQLAlchemy+MariaDB application, which works fine when running local or in docker compose docker compose up . But when I run it in swarm mode (docker stack deploy -c docker-compose.yml issuetest), it creates an connection error after exactly 15 minutes of idle:

sqlalchemy.exc.OperationalError: (asyncmy.errors.OperationalError) (2013, 'Lost connection to MySQL server during query ([Errno 104] Connection reset by peer)')

The default MariaDB timeout should be 8 hours. I can avoid this issue by defining pool_recycle=60*10 (or any other value less than 15 minutes), but would like to understand, what went wrong.

To reproduce, here a minimalistic code sample of app/main.py

import uvicorn
from fastapi import FastAPI
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlmodel import Field, SQLModel, select

engine = create_async_engine("mysql+asyncmy://root:pw@mariadbhost/somedb", future=True)
app = FastAPI()


class Car(SQLModel, table=True):
    id: int = Field(nullable=True, primary_key=True)
    name: str


@app.on_event("startup")
async def on_startup():
    async with engine.begin() as conn:
        await conn.run_sync(SQLModel.metadata.create_all)


async def get_db_cars():
    async with AsyncSession(engine) as session:
        statement = select(Car)
        result = await session.execute(statement)
        cars = result.scalars().all()
    return cars


@app.get("/dbcall")
async def dbcall():
    return await get_db_cars()


if __name__ == "__main__":
    uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=True)

And the docker-compose.yml file:

version: '3.1'

services:
  mariadbhost:
    image: mariadb:10.7
    environment:
      MYSQL_ROOT_PASSWORD: pw
      MYSQL_DATABASE: somedb

  mybackend:
    image: myimage
    ports:
      - 8089:80

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

喜爱皱眉﹌ 2025-02-18 02:23:18

我来晚了。但是,这个答案不仅是给您的,而且将来谁解决了这个问题。

您可以在群中使用MySQL Server的VIP网络模式。
即使您没有使用endpoint_mode在撰写文件中设置VIP模式,endpoint_mode的默认值为VIP模式。

问题是Swarm Network的虚拟IP超时。
使用IPV的群网网络来路由您的TCP请求。但是IPVS的IDLE连接的默认超时为900()和群剂量不修改。

此问题可能会出现在HTTP连接中,该连接使用HTTP客户端集超过15分钟后建立15分钟后没有通信。 HTTP客户端将使用已建立的连接进行通信,但是该连接已被删除的VIP打破。

IPV假定连接是无状态的,并且寿命短,并且在一段时间以前,VIP需要刷新,因为多个实例的同等分布路由。这就是VIP短暂超时的原因

,因此,如果您想解决此问题,则有很多方法。

  1. 将网络模式更改为DNSRR 。 DNSRR返回实例的IP列表,您的应用程序需要选择其中一个。因此,不再使用VIP。
    ,但是您不能将其用于需要使用Ingress模式曝光端口的服务。
  2. 使用保持生命方法。在您的情况下,您可以使用健康检查查询,例如选择1
    在HTTP KeepAlive连接案例中,您可以使用TCP探针使用TCP KeepAlive,或者只是修改了15分钟以下的Keepalive时间。
  3. 更改IPV的默认值。由于不推荐的方式,也没有在我的计算机中进行测试,因此不再有描述。

我希望这个答案对遇到同样问题的任何人都有用。

I'm late. But this answer is for not only you but who reached this question in future.

You might use VIP network mode for MySQL server in swarm.
Even if you didn't set VIP mode using endpoint_mode in compose file, endpoint_mode's default is VIP mode.

The Problem is timeout of Virtual IP of swarm network.
swarm network using ipvs for routing your tcp request. but ipvs's default timeout for idle connection is 900s (linux kernel source) and Swarm dose not modify it.

This problem can appear in HTTP connection that no communications while 15min after established using HTTP client set keepalive time over 15min. HTTP client will communicate using established connection, but the connection is already broken by removed VIP.

ipvs assumes connections are stateless and short lived and VIP need to refresh after a time ago becuase of equally distributed routing for multiple instances. This is the reason for VIP's short timeout

So if you want to fix this problem, there are many ways like below.

  1. change the network mode to dnsrr. dnsrr returns ip list of instances and your application need to choose one of them. so there is no more using vip.
    but you can't use it for service that need to expose port using ingress mode.
  2. Use keep alive method. In your case, you can use health check query like select 1.
    someone that in HTTP keepalive connection case, you can use TCP keepalive using tcp probes or just modify keepalive time to under 15min.
  3. change default value of ipvs. because of not recommended way and not tested in my computer, there is no more describe.

I hope this answer is useful to anyone suffered same problem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文