子进程 stdout/stderr 到有限大小的日志文件

发布于 2024-12-01 01:52:08 字数 578 浏览 8 评论 0原文

我有一个进程与 stderr 进行了很多对话,我想将这些内容记录到文件中。

foo 2> /tmp/foo.log

实际上,我是用 python subprocess.Popen 启动它的,但出于这个问题的目的,它也可能是从 shell 启动的。

with open('/tmp/foo.log', 'w') as stderr:
  foo_proc = subprocess.Popen(['foo'], stderr=stderr)

问题是几天后我的日志文件可能会非常大,例如 > 500 MB。我对所有 stderr 聊天感兴趣,但只对最近的内容感兴趣。如何将日志文件的大小限制为 1 MB?该文件应该有点像循环缓冲区,因为将写入最新的内容,但较旧的内容应该从文件中删除,以便它永远不会超过给定的大小。

我不确定是否有一种优雅的 Unixey 方法可以通过某种特殊文件来做到这一点,但我根本不知道。

只要我不必中断正在运行的进程,具有日志轮转的替代解决方案也足以满足我的需求。

I have a process which chats a lot to stderr, and I want to log that stuff to a file.

foo 2> /tmp/foo.log

Actually I'm launching it with python subprocess.Popen, but it may as well be from the shell for the purposes of this question.

with open('/tmp/foo.log', 'w') as stderr:
  foo_proc = subprocess.Popen(['foo'], stderr=stderr)

The problem is after a few days my log file can be very large, like >500 MB. I am interested in all that stderr chat, but only the recent stuff. How can I limit the size of the logfile to, say, 1 MB? The file should be a bit like a circular buffer in that the most recent stuff will be written but the older stuff should fall out of the file, so that it never goes above a given size.

I'm not sure if there's an elegant Unixey way to do this already which I'm simply not aware of, with some sort of special file.

An alternative solution with log rotation would be sufficient for my needs as well, as long as I don't have to interrupt the running process.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一枫情书 2024-12-08 01:52:08

您应该能够使用 stdlib 日志记录包来执行此操作。您可以执行以下操作,而不是将子进程的输出直接连接到文件:

import logging

logger = logging.getLogger('foo')

def stream_reader(stream):
    while True:
        line = stream.readline()
        logger.debug('%s', line.strip())

这仅记录从流接收的每一行,并且您可以使用提供日志文件轮换的 RotatingFileHandler 配置日志记录。然后,您安排读取该数据并记录它。

foo_proc = subprocess.Popen(['foo'], stderr=subprocess.PIPE)

thread = threading.Thread(target=stream_reader, args=(foo_proc.stderr,))
thread.setDaemon(True) # optional 
thread.start()

# do other stuff

thread.join() # await thread termination (optional for daemons)

当然,您也可以调用stream_reader(foo_proc.stderr),但我假设您在 foo 子进程执行其操作时可能还有其他工作要做。

这是配置日志记录的一种方法(只应执行一次的代码):

import logging, logging.handlers

handler = logging.handlers.RotatingFileHandler('/tmp/foo.log', 'a', 100000, 10)
logging.getLogger().addHandler(handler)
logging.getLogger('foo').setLevel(logging.DEBUG)

这将创建最多 10 个 100K 的文件,名为 foo.log(循环后为 foo.log.1、foo.log.2 等,其中 foo .log 是最新的)。您还可以传入 1000000, 1 来仅提供 foo.log 和 foo.log.1,其中当文件大小超过 1000000 字节时会发生轮换。

You should be able to use the stdlib logging package to do this. Instead of connecting the subprocess' output directly to a file, you can do something like this:

import logging

logger = logging.getLogger('foo')

def stream_reader(stream):
    while True:
        line = stream.readline()
        logger.debug('%s', line.strip())

This just logs every line received from the stream, and you can configure logging with a RotatingFileHandler which provides log file rotation. You then arrange to read this data and log it.

foo_proc = subprocess.Popen(['foo'], stderr=subprocess.PIPE)

thread = threading.Thread(target=stream_reader, args=(foo_proc.stderr,))
thread.setDaemon(True) # optional 
thread.start()

# do other stuff

thread.join() # await thread termination (optional for daemons)

Of course you can call stream_reader(foo_proc.stderr) too, but I'm assuming you might have other work to do while the foo subprocess does its stuff.

Here's one way you could configure logging (code that should only be executed once):

import logging, logging.handlers

handler = logging.handlers.RotatingFileHandler('/tmp/foo.log', 'a', 100000, 10)
logging.getLogger().addHandler(handler)
logging.getLogger('foo').setLevel(logging.DEBUG)

This will create up to 10 files of 100K named foo.log (and after rotation foo.log.1, foo.log.2 etc., where foo.log is the latest). You could also pass in 1000000, 1 to give you just foo.log and foo.log.1, where the rotation happens when the file would exceed 1000000 bytes in size.

玩世 2024-12-08 01:52:08

使用循环缓冲区的方式很难实现,因为一旦出现问题,您就必须不断地重写整个文件。

logrotate 或其他方法的方法将是你的选择。在这种情况下,您只需执行类似的操作:

import subprocess
import signal

def hupsignal(signum, frame):
    global logfile
    logfile.close()
    logfile = open('/tmp/foo.log', 'a')

logfile = open('/tmp/foo.log', 'a')
signal.signal()
foo_proc = subprocess.Popen(['foo'], stderr=subprocess.PIPE)
for chunk in iter(lambda: foo_proc.stderr.read(8192), ''):
    # iterate until EOF occurs
    logfile.write(chunk)
    # or do you want to rotate yourself?
    # Then omit the signal stuff and do it here.
    # if logfile.tell() > MAX_FILE_SIZE:
    #     logfile.close()
    #     logfile = open('/tmp/foo.log', 'a')

这不是一个完整的解决方案;将其视为伪代码,因为它未经测试,并且我不确定某个地方的语法。可能需要进行一些修改才能使其正常工作。但你应该明白了。

此外,它也是一个如何使其与 logrotate 一起工作的示例。当然,如果需要,您可以自行轮换日志文件。

The way with circular buffer would be hard to implement, as you would constantly have to rewrite the whole file as soon as something falls out.

The approach with logrotate or something would be your way to go. In this case, you simply would do similiar to this:

import subprocess
import signal

def hupsignal(signum, frame):
    global logfile
    logfile.close()
    logfile = open('/tmp/foo.log', 'a')

logfile = open('/tmp/foo.log', 'a')
signal.signal()
foo_proc = subprocess.Popen(['foo'], stderr=subprocess.PIPE)
for chunk in iter(lambda: foo_proc.stderr.read(8192), ''):
    # iterate until EOF occurs
    logfile.write(chunk)
    # or do you want to rotate yourself?
    # Then omit the signal stuff and do it here.
    # if logfile.tell() > MAX_FILE_SIZE:
    #     logfile.close()
    #     logfile = open('/tmp/foo.log', 'a')

It is not a complete solution; think of it as pseudocode as it is untested and I am not sure about the syntax in the one or other place. Probably it needs some modification for making it work. But you should get the idea.

As well, it is an example of how to make it work with logrotate. Of course, you can rotate your logfile yourself, if needed.

橙幽之幻 2024-12-08 01:52:08

您也许可以使用“打开文件描述”的属性(与“打开文件描述符”不同但密切相关)。特别是,当前写入位置与打开文件描述相关联,因此共享单个打开文件描述的两个进程可以各自调整写入位置。

因此,在上下文中,原始进程可以保留子进程标准错误的文件描述符,并定期在位置达到 1 MiB 大小时,将指针重新定位到文件的开头,从而实现所需的循环缓冲区效果。

最大的问题是确定当前消息的写入位置,以便您可以从最旧的材料(就在文件位置的前面)读取到最新的材料。覆盖旧行的新行不太可能完全匹配,因此会有一些碎片。您也许可以使用已知的字符序列(例如“XXXXXX”)跟踪子级的每一行,然后让子级的每次写入重新定位以覆盖前一个标记...但这肯定需要控制正在运行的程序跑步。如果它不受您的控制或无法修改,则该选项消失。

另一种方法是定期截断文件(可能在复制文件之后),并让子进程以追加模式写入(因为文件是在父进程中以追加模式打开的)。您可以安排在截断之前将材料从文件复制到备用文件,以保留之前的 1 MiB 数据。这样您最多可以使用 2 MiB,这比 500 MiB 要好得多,并且如果您确实空间不足,可以配置大小。

玩得开心!

You may be able to use the properties of 'open file descriptions' (distinct from, but closely related to, 'open file descriptors'). In particular, the current write position is associated with the open file description, so two processes that share an single open file description can each adjust the write position.

So, in context, the original process could retain the file descriptor for standard error of the child process, and periodically, when the position reaches your 1 MiB size, reposition the pointer to the start of the file, thus achieving your required circular buffer effect.

The biggest problem is determining where the current messages are being written, so that you can read from the oldest material (just in front of the file position) to the newest material. It is unlikely that new lines overwriting the old will match exactly, so there'd be some debris. You might be able to follow each line from the child with a known character sequence (say 'XXXXXX'), and then have each write from the child reposition to overwrite the previous marker...but that definitely requires control over the program that's being run. If it is not under your control, or cannot be modified, that option vanishes.

An alternative would be to periodically truncate the file (maybe after copying it), and to have the child process write in append mode (because the file is opened in the parent in append mode). You could arrange to copy the material from the file to a spare file before truncating to preserve the previous 1 MiB of data. You might use up to 2 MiB that way, which is a lot better than 500 MiB and the sizes could be configured if you're actually short of space.

Have fun!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文