将文本日志文件流式传输到 RabbitMQ,然后在另一端重建?
要求
我们有几台服务器 (20-50) - Solaris 10 和 Linux (SLES) - 运行不同应用程序的组合,每个应用程序都会将一堆日志事件生成到文本文件中。我们需要将这些捕获到一个单独的监控盒中,我们可以在其中进行分析/报告/警报。
当前方法
目前,我们使用带有远程“tail -f”的 SSH 将日志文件从服务器传输到监控盒。然而,这有点脆弱。
新方法
我想用 RabbitMQ 替换它。服务器将其日志事件发布到此中,然后每个监视脚本/应用程序可以订阅适当的队列。
理想情况下,我们希望应用程序本身将事件直接转储到 RabbitMQ 队列中。
然而,假设短期内这不是一个选择(我们可能没有所有应用程序的源代码),我们需要一种基本上从磁盘“tail -f”日志文件的方法。我对 Python 最熟悉,所以我正在寻找一种 Pythonic 方式来做到这一点 - 共识似乎是只使用带有 readline() 和 sleep() 的循环来模拟“tail -f”。
问题
有没有一种更简单的方法可以将一大堆文本文件直接“tail -f”到 RabbitMQ 流上?内置的东西,还是我们可以利用的扩展?这里还有其他提示/建议吗?
如果我们确实编写一个 Python 包装器来捕获所有日志文件并发布它们 - 我理想情况下希望使用单个 Python 脚本来同时处理所有日志文件,而不是为每个日志文件手动启动单独的实例。我们应该如何解决这个问题?在性能、CPU 使用率、吞吐量、并发性等方面是否有考虑?
- 我们需要订阅队列,然后可能将事件转储回磁盘并重建原始日志文件。对此有什么提示/建议吗?我们还希望可以启动一个 Python 脚本来处理重建所有日志文件 - 而不是同一脚本的 50 个单独实例 - 这容易实现吗?
干杯, Victor
PS:我们确实查看了 Facebook 的 Scribe 以及 Flume,对于我们的需求来说,两者似乎都有点重量级。
Requirements
We have several servers (20-50) - Solaris 10 and Linux (SLES) - running a mix of different applications, each generating a bunch of log events into textfiles. We need to capture these to a separate monitoring box, where we can do analysis/reporting/alerts.
Current Approach
Currently, we use SSH with a remote "tail -f" to stream the logfiles from the servers onto the monitoring box. However, this is somewhat brittle.
New Approach
I'd like to replace this with RabbitMQ. The servers would publish their log events into this, and each monitoring script/app could then subscribe to the appropriate queue.
Ideally, we'd like the applications themselves to dump events directly into the RabbitMQ queue.
However, assuming that's not an option in the short term (we may not have source for all the apps), we need a way to basically "tail -f" the logfiles from disk. I'm most comfortable in Python, so I was looking at a Pythonic way of doing that - the consensus seems to be to just use a loop with readline() and sleep() to emulate "tail -f".
Questions
Is there an easier way of "tail -f" a whole bunch of textfiles directly onto a RabbitMQ stream? Something inbuilt, or an extension we could leverage on? Any other tips/advice here?
If we do write a Python wrapper to capture all the logfiles and publish them - I'd ideally like a single Python script to concurrently handle all the logfiles, rather than manually spinning up a separate instance for each logfile. How should we tackle this? Are there considerations in terms of performance, CPU usage, throughput, concurrency etc.?
- We need to subscribe to the queues, and then possibly dump the events back to disk and reconstruct the original logfiles. Any tips/advice on this? And we'd also like a single Python script we could startup to handle reconstructing all of the logfiles - rather than 50 separate instances of the same script - is that easily achievable?
Cheers,
Victor
PS: We did have a look at Facebook's Scribe, as well as Flume, and both seem a little heavyweight for our needs.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您似乎正在描述以rabbitmq 作为传输的集中式系统日志。
如果您可以接受 syslog,请查看 syslog-ng。否则,你可能
通过使用部分logstash ( http://logstash.net/ ) 可以节省一些时间。
You seem to be describing centralized syslog with rabbitmq as the transport.
If you could live with syslog, take a look at syslog-ng. Otherwise, you might
save some time by using parts of logstash ( http://logstash.net/ ).
如果可能的话,您可以让应用程序将事件异步发布到 RabbitMQ,而不是将其写入日志文件。我目前已经在 Java 中完成了此操作。
但有时无法让应用程序按照您想要的方式记录日志。
1 )您可以用 python 编写一个文件tailer,将其发布到AMQP。我不知道有什么可以插入文件作为 RabbitMQ 的输入。看看http://code.activestate.com/recipes/436477-filetailpy/ 和 http://www.perlmonks.org/?node_id=735039 用于跟踪文件。
2) 您可以创建一个 Python 守护进程,它可以作为进程或以循环方式尾随所有给定文件。
3)与2类似的方法可以帮助您解决这个问题。您可能可以为每个日志文件设置一个队列。
If it would be possible you can make the Application publish the events Asynchronously to RabbitMQ instead of writing it to log files. I have done this currently in Java.
But some times it is not possible to make the app log the way you want.
1 ) You can write a file tailer in python which publishes to AMQP. I don't know of anything which plugs in a File as the input to RabbitMQ. Have a look at http://code.activestate.com/recipes/436477-filetailpy/ and http://www.perlmonks.org/?node_id=735039 for tailing files.
2) You can create a Python Daemon which can tail all the given files either as processes or in a round robin fashion.
3) A similar approach to 2 can help you solve this. You can probably have a single queue for each log file.
如果您正在谈论应用程序日志记录(而不是例如 Apache Web 服务器日志等访问日志),您可以使用 stdlib 日志记录处理程序 写入 AMQP 中间件。
If you are talking about application logging (as opposed to e.g. access logs such as Apache webserver logs), you can use a handler for stdlib logging which writes to AMQP middleware.