hadoop 日志记录工具?
如果我要使用 Zookeeper 作为工作队列并连接到它的各个消费者/工作人员。您会推荐什么作为记录这些工人活动的良好分布式设置?
假设如下:
1) 在任何时候,我们都可以减少到只有一台计算机来容纳 hadoop 集群。系统将根据需要自动缩放,但有大量停机时间,只需要一台计算机。
2)我只需要能够访问所有工作人员日志,而无需访问工作人员所在的单台计算机。请记住,当我阅读这些日志之一时,该机器很可能已终止并早已消失。
3)我们需要轻松访问日志,即能够 cat/grep 和 tail 或者以更 SQLish 的方式 - 我们需要实时查询以及实时监控短时间内的输出时间。 (即 tail -f /var/log/mylog.1)
我很欣赏您的专家想法!
谢谢。
If I am to use zookeeper as a work queue and connect to it individual consumers/workers. What would you recommend as a good distributed setup for logging these workers' activities?
Assume the following:
1) At anytime we could be down to 1 single computer housing the hadoop cluster. The system will autoscale up and down as needed but has alot of down time where only 1 single computer is needed.
2) I just need the ability to access all of the workers logs without accessing the individual machine that worker is located at. Bare in mind, that by the time I get to read one of these logs that machine might very well be terminated and long gone.
3) We'll need easy access to the logs i.e being able to cat/grep and tail or alternatively in a more SQLish manner - we'll need real time ability to both query as well as monitor output for short periods of time in real time. (i.e tail -f /var/log/mylog.1)
I appreciate your expert ideas here!
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您是否考虑过使用 Flume、chukwa 或 scribe - 确保您的 Flume 等进程可以访问您尝试聚合到集中式服务器上的日志文件。
水槽参考:
http://archive.cloudera.com/cdh/3/flume/Cookbook/
楚夸:
http://incubator.apache.org/chukwa/docs/r0.4.0 /admin.html
抄写:
https://github.com/facebook/scribe/wiki/_pages
希望有帮助。
Have you looked at using Flume, chukwa or scribe - ensure that your flume etc process has access to the log files that you are trying to aggregate onto a centralized server.
flume reference:
http://archive.cloudera.com/cdh/3/flume/Cookbook/
chukwa:
http://incubator.apache.org/chukwa/docs/r0.4.0/admin.html
scribe:
https://github.com/facebook/scribe/wiki/_pages
hope it helps.
Fluentd 日志收集器刚刚发布了其 WebHDFS 插件,该插件允许用户立即将数据流式传输到 HDFS 中。它确实易于安装且易于管理。
当然,您可以直接从应用程序导入数据。下面是一个针对 Fluentd 发布日志的 Java 示例。 Fluentd 的 Java 库足够聪明,可以在 Fluentd 守护进程关闭时在本地进行缓冲。这减少了数据丢失的可能性。
还提供高可用性配置,这基本上使您能够集中式日志聚合系统。
Fluentd log collector just released its WebHDFS plugin, which allows the users to instantly stream data into HDFS. It's really easy to install with ease of management.
Of course you can import data directly from your applications. Here's a Java example to post logs against Fluentd. Fluentd's Java library is clever enough to buffer locally when Fluentd daemon is down. This lessens the possibility of the data loss.
High availability configuration is also available, which basically enables you to have centralized log aggregation system.