在 Python 中使用多处理时应该如何记录?
现在我在框架中有一个中心模块,它使用 Python 2.6 生成多个进程多处理
模块。 因为它使用multiprocessing
,所以有模块级多处理感知日志,LOG = multiprocessing.get_logger()
。 根据文档,此记录器(编辑)确实不具有进程共享锁,这样您就不会因为多个进程同时写入而导致 sys.stderr
(或任何文件句柄)中的内容出现混乱。
我现在遇到的问题是框架中的其他模块不支持多处理。 在我看来,我需要使这个中央模块的所有依赖项都使用多处理感知日志记录。 这在框架内很烦人,更不用说对于框架的所有客户端而言。 还有我没有想到的替代方案吗?
Right now I have a central module in a framework that spawns multiple processes using the Python 2.6 multiprocessing
module. Because it uses multiprocessing
, there is module-level multiprocessing-aware log, LOG = multiprocessing.get_logger()
. Per the docs, this logger (EDIT) does not have process-shared locks so that you don't garble things up in sys.stderr
(or whatever filehandle) by having multiple processes writing to it simultaneously.
The issue I have now is that the other modules in the framework are not multiprocessing-aware. The way I see it, I need to make all dependencies on this central module use multiprocessing-aware logging. That's annoying within the framework, let alone for all clients of the framework. Are there alternatives I'm not thinking of?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(22)
我刚刚编写了自己的日志处理程序,它只是通过管道将所有内容提供给父进程。 我只测试了十分钟,但似乎效果很好。
(注意:这被硬编码为
RotatingFileHandler
,这是我自己的用例。)更新:@javier 现在将此方法作为 Pypi 上可用的包进行维护 - 请参阅 multiprocessing-logging,github 位于 https://github.com/jruere/multiprocessing-logging
更新:实施!
现在,它使用队列来正确处理并发,并且还可以正确地从错误中恢复。 我现在已经在生产中使用它几个月了,下面的当前版本可以正常工作。
I just now wrote a log handler of my own that just feeds everything to the parent process via a pipe. I've only been testing it for ten minutes but it seems to work pretty well.
(Note: This is hardcoded to
RotatingFileHandler
, which is my own use case.)Update: @javier now maintains this approach as a package available on Pypi - see multiprocessing-logging on Pypi, github at https://github.com/jruere/multiprocessing-logging
Update: Implementation!
This now uses a queue for correct handling of concurrency, and also recovers from errors correctly. I've now been using this in production for several months, and the current version below works without issue.
非侵入式处理此问题的唯一方法是:
select
,对可用日志条目执行合并排序,然后刷新到集中日志。)The only way to deal with this non-intrusively is to:
select
from the pipes' file descriptors, perform merge-sort on the available log entries, and flush to centralized log. Repeat.)QueueHandler
原生于 Python 3.2+,并且正是这样做的。 它很容易在以前的版本中复制。Python 文档有两个完整的示例: 从多个进程记录到单个文件
每个进程(包括父进程)将其日志记录放在
Queue
上,然后是一个listener
线程或进程(为每个提供一个示例)拾取它们并将它们全部写入文件 - 没有损坏或乱码的风险。对于那些使用 Python 的人来说 3.2、导入logutils(与python 3.2原生代码相同)。
附言。 如果你的CPU受到限制,
顺便说一句,除了日志记录过程之外,不需要任何地方的StreamHandler(默认情况下会添加日志记录),并且在分析中我发现与仅使用QueueHandler相比,由于所有额外的格式,它会增加显着的CPU使用率,您可以从非日志记录进程中将其删除:
或者,如果您尚未添加任何处理程序,则在添加
QueueHandler
之前,您可以使用以下方法删除StreamHandler
:logger.removeHandler(logger.handlers[0])
QueueHandler
is native in Python 3.2+, and does exactly this. It is easily replicated in previous versions.Python docs have two complete examples: Logging to a single file from multiple processes
Each process (including the parent process) puts its logging on the
Queue
, and then alistener
thread or process (one example is provided for each) picks those up and writes them all to a file - no risk of corruption or garbling.For those using Python < 3.2, import logutils (which is the same as the python 3.2 native code).
PS. If you're cpu constrained
As an aside, there's no need for StreamHandler (which logging adds by default) anywhere but the logging process, and in profiling I found it adds significant cpu usage vs having QueueHandler only, due to all the additional formatting, record creation etc. You can remove it from non-logging processes:
Alternatively, if you've not yet added any handlers, before adding
QueueHandler
you could removeStreamHandler
with:logger.removeHandler(logger.handlers[0])
下面是另一个解决方案,重点是为从 Google 访问这里的其他人(例如我)提供简单性。 记录应该很容易! 仅适用于 3.2 或更高版本。
Below is another solution with a focus on simplicity for anyone else (like me) who get here from Google. Logging should be easy! Only for 3.2 or higher.
截至 2020 年,似乎有一种更简单的多处理日志记录方法。
此函数将创建记录器。 您可以在此处设置格式以及您希望输出的位置(文件、stdout):
在 init 中实例化记录器:
现在,您只需在需要记录的每个函数中添加此引用:
并输出消息:
希望这有帮助。
As of 2020 it seems there is a simpler way of logging with multiprocessing.
This function will create the logger. You can set the format here and where you want your output to go (file, stdout):
In the init you instantiate the logger:
Now, you only need to add this reference in each function where you need logging:
And output messages:
Hope this helps.
另一种选择可能是
logging
package:SocketHandler
DatagramHandler
SyslogHandler
(以及其他)
这样,您可以轻松地在某个地方编写一个日志守护进程安全并正确处理结果。 (例如,一个简单的套接字服务器,它只是解封消息并将其发送到自己的旋转文件处理程序。)
SyslogHandler
也会为您处理这个问题。 当然,您可以使用您自己的 syslog 实例,而不是系统实例。Yet another alternative might be the various non-file-based logging handlers in the
logging
package:SocketHandler
DatagramHandler
SyslogHandler
(and others)
This way, you could easily have a logging daemon somewhere that you could write to safely and would handle the results correctly. (E.g., a simple socket server that just unpickles the message and emits it to its own rotating file handler.)
The
SyslogHandler
would take care of this for you, too. Of course, you could use your own instance ofsyslog
, not the system one.其他线程的变体,将日志记录和队列线程分开。
A variant of the others that keeps the logging and queue thread separate.
所有当前的解决方案都通过使用处理程序与日志记录配置过于耦合。 我的解决方案具有以下架构和功能:
logging.Logger
(以及已定义的实例)被修补以将所有记录发送到队列使用示例和输出的代码可以在以下要点中找到:https://gist.github.com/schlamar/7003737
All current solutions are too coupled to the logging configuration by using a handler. My solution has the following architecture and features:
multiprocessing.Queue
logging.Logger
(and already defined instances) are patched to send all records to the queueCode with usage example and output can be found at the following Gist: https://gist.github.com/schlamar/7003737
由于我们可以将多进程日志表示为多个发布者和一个订阅者(侦听器),因此使用 ZeroMQ 来实现PUB-SUB 消息传递确实是一种选择。
此外,PyZMQ 模块(ZMQ 的 Python 绑定)实现了 PUBHandler,用于通过 zmq.PUB 发布日志消息的对象插座。
网络上有一个解决方案,使用 PyZMQ 和 PUBHandler 从分布式应用程序进行集中日志记录,可以轻松采用它在本地处理多个发布进程。
Since we can represent multiprocess logging as many publishers and one subscriber (listener), using ZeroMQ to implement PUB-SUB messaging is indeed an option.
Moreover, PyZMQ module, the Python bindings for ZMQ, implements PUBHandler, which is object for publishing logging messages over a zmq.PUB socket.
There's a solution on the web, for centralized logging from distributed application using PyZMQ and PUBHandler, which can be easily adopted for working locally with multiple publishing processes.
我也喜欢 zzzeek 的答案,但安德烈是正确的,需要队列来防止乱码。 我对管道有一些运气,但确实看到了乱码,这在某种程度上是预料之中的。 事实证明,实现它比我想象的要困难,特别是由于在 Windows 上运行,其中对全局变量和内容有一些额外的限制(请参阅:Python 多重处理是如何在 Windows 上实现的?)
但是,我终于让它工作了。 这个例子可能并不完美,所以欢迎评论和建议。 它也不支持设置格式化程序或根记录器以外的任何内容。 基本上,您必须使用队列重新初始化每个池进程中的记录器,并在记录器上设置其他属性。
再次强调,欢迎任何关于如何使代码变得更好的建议。 我当然还不知道所有的 Python 技巧:-)
I also like zzzeek's answer but Andre is correct that a queue is required to prevent garbling. I had some luck with the pipe, but did see garbling which is somewhat expected. Implementing it turned out to be harder than I thought, particularly due to running on Windows, where there are some additional restrictions about global variables and stuff (see: How's Python Multiprocessing Implemented on Windows?)
But, I finally got it working. This example probably isn't perfect, so comments and suggestions are welcome. It also does not support setting the formatter or anything other than the root logger. Basically, you have to reinit the logger in each of the pool processes with the queue and set up the other attributes on the logger.
Again, any suggestions on how to make the code better are welcome. I certainly don't know all the Python tricks yet :-)
我想建议使用 logger_tt 库: https://github.com/Dragon2fly/logger_tt
multiporcessing_logging 库无法在我的 macOSX 上运行,而 logger_tt 可以。
I'd like to suggest to use the logger_tt library: https://github.com/Dragon2fly/logger_tt
The multiporcessing_logging library is not working on my macOSX, while logger_tt does.
concurrent-log-handler 似乎完美地完成了这项工作。 在 Windows 上测试。 还支持 POSIX 系统。
主要思想
get_logger()
。multiprocessing.Process
子类,这意味着run()
方法的开始。详细说明
在本示例中,我将使用以下文件结构
代码
子进程
multiprocessing.Process
的简单子进程,并简单地记录到文件文本“子进程”get_logger()
在run()
内部或子进程内部的其他位置调用(不是模块级别或在__init__( )
。)这是必需的,因为get_logger()
创建ConcurrentRotatingFileHandler
实例,并且每个进程都需要新实例。do_something
仅用于演示这适用于第 3 方库代码,该代码没有任何线索表明您正在使用并发日志处理程序。主进程
multiprocessing.Process
。get_logger()
和do_something()
的注释与子进程相同。记录器设置
示例应用程序
使用标准
日志记录
的第3方模块示例输出示例
The concurrent-log-handler seems to do the job perfectly. Tested on Windows. Supports also POSIX systems.
Main idea
ConcurrentRotatingFileHandler
for each process. Example functionget_logger()
given below.multiprocessing.Process
subclass it would mean the beginning of therun()
method.Detailed instructions
I this example, I will use the following file structure
Code
Child process
multiprocessing.Process
and simply logs to file text "Child process"get_logger()
is called inside therun()
, or elsewhere inside the child process (not module level or in__init__()
.) This is required asget_logger()
createsConcurrentRotatingFileHandler
instance, and new instance is needed for each process.do_something
is used just to demonstrate that this works with 3rd party library code which does not have any clue that you are using concurrent-log-handler.Main Process
multiprocessing.Process
.get_logger()
anddo_something()
apply as for the child process.Logger setup
ConcurrentRotatingFileHandler
from the concurrent-log-handler package. Each process needs a fresh ConcurrentRotatingFileHandler instance.ConcurrentRotatingFileHandler
should be the same in every process.Example app
Example of 3rd party module using standard
logging
Example output
只需在某处发布您的记录器实例即可。 这样,其他模块和客户端就可以使用您的 API 来获取记录器,而无需
导入多处理
。just publish somewhere your instance of the logger. that way, the other modules and clients can use your API to get the logger without having to
import multiprocessing
.将所有日志记录委托给另一个从队列中读取所有日志条目的进程怎么样?
只需通过任何多进程机制甚至继承共享 LOG_QUEUE 即可,一切正常!
How about delegating all the logging to another process that reads all log entries from a Queue?
Simply share LOG_QUEUE via any of the multiprocess mechanisms or even inheritance and it all works out fine!
下面是一个可以在Windows环境下使用的类,需要ActivePython。
您还可以继承其他日志处理程序(StreamHandler 等),
这是一个演示用法的示例:
Below is a class that can be used in Windows environment, requires ActivePython.
You can also inherit for other logging handlers (StreamHandler etc.)
And here is an example that demonstrates usage:
我有一个与ironhacker类似的解决方案,除了我在一些代码中使用logging.exception,并发现我需要在将异常传递回队列之前格式化异常,因为回溯不可pickle:
I have a solution that's similar to ironhacker's except that I use logging.exception in some of my code and found that I needed to format the exception before passing it back over the Queue since tracebacks aren't pickle'able:
如果
logging
模块中的锁、线程和分叉组合发生死锁,则会在 错误报告 6721(另请参阅相关 SO 问题)。此处发布了一个小型修复解决方案。
但是,这只会修复
日志记录
中任何潜在的死锁。 这并不能解决事情可能会出现混乱的问题。 请参阅此处提供的其他答案。If you have deadlocks occurring in a combination of locks, threads and forks in the
logging
module, that is reported in bug report 6721 (see also related SO question).There is a small fixup solution posted here.
However, that will just fix any potential deadlocks in
logging
. That will not fix that things are maybe garbled up. See the other answers presented here.这是我的简单黑客/解决方法......不是最全面的,但易于修改并且更易于阅读和理解,我认为比我在写这篇文章之前找到的任何其他答案都更简单:
Here's my simple hack/workaround... not the most comprehensive, but easily modifiable and simpler to read and understand I think than any other answers I found before writing this:
有这个很棒的包包
:
https://pypi.python.org/pypi/multiprocessing-logging/
代码:
https://github.com/jruere/multiprocessing-logging
安装:
然后添加:
There is this great package
Package:
https://pypi.python.org/pypi/multiprocessing-logging/
code:
https://github.com/jruere/multiprocessing-logging
Install:
Then add:
对于可能需要这个的人,我为 multiprocessing_logging 包编写了一个装饰器,它将当前进程名称添加到日志中,以便清楚谁记录什么。
它还运行 install_mp_handler(),因此在创建池之前运行它变得毫无用处。
这使我可以看到哪个工作人员创建了哪些日志消息。
这是带有示例的蓝图:
For whoever might need this, I wrote a decorator for multiprocessing_logging package that adds the current process name to logs, so it becomes clear who logs what.
It also runs install_mp_handler() so it becomes unuseful to run it before creating a pool.
This allows me to see which worker creates which logs messages.
Here's the blueprint with an example:
其中一种替代方案是将多处理日志写入已知文件并注册一个 atexit 处理程序以加入这些进程,并在 stderr 上读回它; 但是,这样您将无法获得 stderr 上输出消息的实时流。
One of the alternatives is to write the mutliprocessing logging to a known file and register an
atexit
handler to join on those processes read it back on stderr; however, you won't get a real-time flow to the output messages on stderr that way.如上所述,最简单的想法是:
[WatchedFileHandler][1]
。 详细讨论了此处理程序的原因 这里,但简而言之,与其他日志记录处理程序存在某些更糟糕的竞争条件。 这个具有最短的竞争条件窗口。Simplest idea as mentioned:
[WatchedFileHandler][1]
. The reasons for this handler are discussed in detail here, but in short there are certain worse race conditions with the other logging handlers. This one has the shortest window for the race condition.