实现半循环文件,可按需扩展和保存
好吧,这个标题会有点令人困惑。让我试着更好地解释一下。我正在构建一个日志记录程序。该程序将有 3 个主要状态:
写入循环缓冲文件,仅保留最后 10 分钟的数据。
写入缓冲文件,忽略时间(记录所有数据)。
重命名整个缓冲区文件,并使用过去 10 分钟的数据开始一个新的缓冲区文件(并将状态更改为 1)。
现在,用例是这样的。我的网络时常遇到一些网络瓶颈。所以我想建立一个系统来记录TCP流量,当它检测到瓶颈(通过Nagios检测)时。然而,当它检测到瓶颈时,大部分有用数据已经被传输。
所以,我想要的是有一个运行类似 dumpcap< 的守护进程/code>
一直如此。在正常模式下,它只会保留过去 10 分钟的数据(因为如果不需要的话,保留大量数据是没有意义的)。但是当Nagios发出警报时,我会在守护进程中发送一个信号来存储所有内容。然后,当 Naigos 恢复时,它将发送另一个信号以停止存储并将缓冲区刷新到保存文件。
现在的问题是我不知道如何干净地存储循环 10 分钟的数据。我可以每 10 分钟存储一个新文件,并在模式 1 下删除旧文件。但这对我来说似乎有点脏(特别是在确定文件中发生警报的时间时)。
理想情况下,保存的文件应使警报始终位于文件中的 10:00 标记处。虽然每 10 分钟就有新文件可以做到这一点,但“修复”文件到那时似乎有点脏。
有什么想法吗?我应该做一个旋转文件系统并在最后将它们组合成1(进行相当多的后处理)吗?有没有一种方法可以干净地实现半循环文件,从而不需要任何后处理?
谢谢
噢,而且现阶段语言并不那么重要(我倾向于Python,但不反对任何其他语言。这比整体设计问题小)...
Ok, that title is going to be a little bit confusing. Let me try to explain it a little bit better. I am building a logging program. The program will have 3 main states:
Write to a round-robin buffer file, keeping only the last 10 minutes of data.
Write to a buffer file, ignoring the time (record all data).
Rename entire buffer file, and start a new one with the past 10 minutes of data (and change state to 1).
Now, the use case is this. I have been experiencing some network bottlenecks from time to time in our network. So I want to build a system to record TCP traffic when it detects the bottleneck (detection via Nagios). However by the time it detects the bottlenecking, most of the useful data has already been transmitted.
So, what I'd like is to have a deamon that runs something like dumpcap
all the time. In normal mode, it'll only keep the past 10 minutes of data (Since there's no point in keeping a boat load of data if it's not needed). But when Nagios alerts, I will send a signal in the deamon to store everything. Then, when Naigos recovers it will send another signal to stop storing and flush the buffer to a save file.
Now, the problem is that I can't see how to cleanly store a rotating 10 minutes of data. I could store a new file every 10 minutes and delete the old ones if in mode 1. But that seems a bit dirty to me (especially when it comes to figuring out when the alert happened in the file).
Ideally, the file that was saved should be such that the alert is always at the 10:00 mark in the file. While that is possible with new files every 10 minutes, it seems like a bit dirty to "repair" the files to that point.
Any ideas? Should I just do a rotating file system and combine them into 1 at the end (doing quite a bit of post-processing)? Is there a way to implement the semi-round-robin file cleanly so that there is no need for any post-processing?
Thanks
Oh, and the language doesn't matter as much at this stage (I'm leaning towards Python, but have no objection to any other language. It's less of an issue than the overall design)...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我想到的第一个想法是存储
MINUTES+1
(在本例中为 11)一分钟的文件。扔掉旧的。根据要求,您可以复制/合并当前未写入一个“大日志文件”的 10 个文件,并附加完成的每个其他文件的内容。
话又说回来,这看起来像是一个“必须有一个工具来完成类似的任务”,也许有人会为此想出一个工具:)
这没有解决的一个问题是完全最后一个X分钟的数据。它总是从 0 秒开始。
The first idea that comes to mind is to store
MINUTES+1
(in this case 11) one minute files. Throwing away older ones.On request you could copy/merge the 10 files that don't get currently written into one "big log file" and append the content of every other file that finishes.
Then again this looks like a "there has to be tool for something like that" task and maybe someone will come up with a tool for that :)
One problem this does not solve is having exactly the last X minutes for data. It will always start from 0 seconds.
这不完全是你要找的,但我认为 MongoDB Capped Collections 是您可能想看的东西。
因此,将您的所有内容记录到一个上限集合中,您已固定该集合的大小以存储大约 10 分钟的数据。当 Nagios 发送信号时,切换到存储到非上限集合中,直到瓶颈通过,然后切换回来。 MongoDB 将自动按行处理旧数据的老化,而不是一次移出整个 10 分钟的文件。
It's not exactly what you're looking for, but I think MongoDB Capped Collections are something you might want to look at.
So log all your stuff to a capped collection, which you've fixed in size to store approximately 10 minutes worth of data. When Nagios sends a signal, switch to storing into a non-capped collection until the bottleneck passes, then switch back. MongoDB will handle the aging out of old data on a per-row basis automatically, rather than shifting out whole 10 minute files at a time.
只记录最后 10 分钟的日志有什么好处?要实现此目的,您需要不断检查旧日志并将其从文件中删除,然后覆盖该文件。这种功能可以通过一些数据库更容易地实现,例如SQLite。
日志时间戳为您提供相同甚至更多。只需按照您的描述保留两个日志文件,如果日志文件已经有 10 分钟的日志 - 重命名它(覆盖旧的)并开始记录到新文件。
what is a benefit of taking only exactly last 10 mins of logs? to implement this you'll need to constantly check for old logs and removes them from file, then overwrite the file. such functionality can be easier achieved by some DB, e.g. SQLite.
log timestamps give you the same and more. Just keep two log-files as you described, if log-file already has 10mins of logs - rename it (overwriting older one) and start logging to a new file.