如何在分布式系统中分析日志?
当分布式系统(如筏节点)中发生意外行为时,请求的逻辑趋势或数据流的逻辑趋势通常只能通过日志分析。但是,由于分布式系统,这很困难。我发现有一些工具,例如 shiviz 可以通过日志可视化请求或数据流,但是需要修改源代码。还有其他类似的入侵工具吗?
When an unexpected behavior occurs in a distributed system(like raft nodes), the logical trend of the request or data flow usually can only be analyzed by logs. However, due to the distributed systems, this is difficult. I found that there are tools like shiviz that can visualize requests or data flow through logs, but require modification of the source code. Are there any other similar invasive tools?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
有两种主要方法。一个是拥有一个可以转到每个服务器并搜索日志的工具。另一个选项是要有一个用于日志的中心位置,并且所有节点都将日志推向该存储 - 这就是AWS CloudWatch的工作方式。
无论哪种情况,从操作员的角度来看,都有一个工具可以在其中搜索所有日志。
您问题的第二部分 - 如何使此分析有效。
首先,日志应该具有良好的质量。这是一件天真的事情,但这非常重要。我无法计算我分析了多少次详细的次数,但毫无用处的日志。
第二个挑战 - 如何分析跨越几个节点的过程。这更复杂。这里有两个主要功能:
There are two major approaches. The one is to have a tool which can go to every server and search their logs. The other option is to have a central location for logs and all nodes push their logs to that storage - this is the way how AWS CloudWatch works.
In either case, from an operator point of view, there is a tool where they can search all logs.
The second part of your question - how to make this analysis effective.
First of all, logs should be of a good quality. This is a naive thing to say, but it is very important. I can't count how many times I analyzed detailed, but useless logs.
The second challenge - how to analyze processes which span across several nodes. This is more complicated. There are two main features here: