如何捕获一系列 Celery 任务执行过程中生成的所有 python 日志记录?

发布于 2024-10-02 17:15:51 字数 529 浏览 2 评论 0原文

我想将我自己开发的任务队列系统转换为基于 Celery 的任务队列,但我目前拥有的一个功能给我带来了一些困扰。

现在,我的任务队列运行得非常粗糙;我运行该作业(生成数据并将其上传到另一台服务器),使用 Nose 日志捕获库上的变体收集日志记录,然后将任务的日志记录作为详细结果记录存储在应用程序数据库中。

我想将其分解为三个任务:

  1. 收集数据
  2. 上传数据
  3. 报告结果(包括前两个任务的所有日志记录)

这里真正的关键是日志记录收集。现在,使用日志捕获,我对数据生成和上传过程中进行的每个日志调用都有一系列日志记录。这些是诊断目的所必需的。鉴于这些任务甚至不能保证在同一进程中运行,目前还不清楚如何在 Celery 任务队列中完成此任务。

我对此问题的理想解决方案将是一种简单且理想的微创方法,用于捕获前置任务 (1, 2) 期间的所有日志记录并将其提供给报告者任务 (3)

我最好保持相当粗粒度的工作任务定义,并将所有这些工作放在一项任务中?或者有没有办法传递现有捕获的日志记录以便最后收集它?

I want to convert my homegrown task queue system into a Celery-based task queue, but one feature I currently have is causing me some distress.

Right now, my task queue operates very coarsely; I run the job (which generates data and uploads it to another server), collect the logging using a variant on Nose's log capture library, and then I store the logging for the task as a detailed result record in the application database.

I would like to break this down as three tasks:

  1. collect data
  2. upload data
  3. report results (including all logging from the preceding two tasks)

The real kicker here is the logging collection. Right now, using the log capture, I have a series of log records for each log call made during the data generation and upload process. These are required for diagnostic purposes. Given that the tasks are not even guaranteed to run in the same process, it's not clear how I would accomplish this in a Celery task queue.

My ideal solution to this problem will be a trivial and ideally minimally invasive method of capturing all logging during the predecessor tasks (1, 2) and making it available to the reporter task (3)

Am I best off remaining fairly coarse-grained with my task definition, and putting all of this work in one task? or is there a way to pass the existing captured logging around in order to collect it at the end?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

╭⌒浅淡时光〆 2024-10-09 17:15:51

我假设您正在使用logging模块。您可以为每个任务集使用单独的命名记录器来完成这项工作。他们将从上层继承所有配置。

task.py 中:

import logging

@task
step1(*args, **kwargs):
    # `key` is some unique identifier common for a piece of data in all steps of processing
    logger = logging.getLogger("myapp.tasks.processing.%s"%key)
    # ...
    logger.info(...) # log something

@task
step2(*args, **kwargs):
    logger = logging.getLogger("myapp.tasks.processing.%s"%key)
    # ...
    logger.info(...) # log something

这里,所有记录都发送到同一个命名的记录器。现在,您可以使用两种方法来获取这些记录:

  1. 配置文件侦听器,其名称取决于记录器名称。完成最后一步后,只需读取该文件中的所有信息即可。确保为此侦听器禁用输出缓冲,否则您将面临丢失记录的风险。

  2. 创建自定义侦听器,该侦听器将在内存中累积记录,然后在通知时返回所有记录。我在这里使用 memcached 进行存储,它比创建自己的跨进程存储更简单。

I assume you are using logging module. You can use separate named logger per task set to do the job. They will inherit all configuration from upper level.

in task.py:

import logging

@task
step1(*args, **kwargs):
    # `key` is some unique identifier common for a piece of data in all steps of processing
    logger = logging.getLogger("myapp.tasks.processing.%s"%key)
    # ...
    logger.info(...) # log something

@task
step2(*args, **kwargs):
    logger = logging.getLogger("myapp.tasks.processing.%s"%key)
    # ...
    logger.info(...) # log something

Here, all records were sent to the same named logger. Now, you can use 2 approaches to fetch those records:

  1. Configure file listener with name that depends on logger name. After last step, just read all info from that file. Make sure output buffering is disabled for this listener or you risk loosing records.

  2. Create custom listener that would accumulate records in memory then return them all when told so. I'd use memcached for storage here, it's simpler than creating your own cross-process storage.

淡紫姑娘! 2024-10-09 17:15:51

听起来某种“观察者”是理想的选择。如果您可以将日志作为流来观看和使用,那么您可以在结果进入时获取结果。由于观察者将单独运行,因此对于它正在观看的内容没有依赖性,我相信这将满足您对非-侵入性解决方案。

It sounds like some kind of 'watcher' would be ideal. If you can watch and consume the logs as a stream you could slurp the results as they come in. Since the watcher would be running seperately and therefore have no dependencies with respect to what it is watching I believe this would satisfy your requirements for a non-invasive solution.

红尘作伴 2024-10-09 17:15:51

Django Sentry 是 Python(和 Django)的日志记录实用程序,并且支持 Celery。

Django Sentry is a logging utility for Python (and Django), and has support for Celery.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文