日志记录级别 - Logback - 分配日志级别的经验法则
我在当前项目中使用 logback 。
它提供了六个级别的日志记录: TRACE DEBUG INFO WARN ERROR OFF
我正在寻找一个经验法则来确定常见活动的日志级别。 例如,如果线程被锁定,日志消息应该设置为调试级别还是信息级别。 或者,如果正在使用套接字,则应在调试级别或跟踪级别记录其特定 ID。
我将感谢为每个日志记录级别提供更多示例的答案。
I'm using logback in my current project.
It offers six levels of logging: TRACE DEBUG INFO WARN ERROR OFF
I'm looking for a rule of thumb to determine the log level for common activities.
For instance, if a thread is locked, should the log message be set to the debug level or the info level.
Or if a socket is being used, should its specific id be logged at the debug level or the trace level.
I will appreciate answers with more examples for each logging level.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我主要构建大规模、高可用性类型的系统,所以我的答案偏向于从生产支持的角度来看;也就是说,我们大致分配如下:
错误:系统陷入困境,客户可能受到影响(或很快就会受到影响),修复可能需要人工干预。 “凌晨 2 点规则”在这里适用 - 如果您正在值班,如果发生这种情况,您是否希望在凌晨 2 点被叫醒?如果是,则将其记录为“错误”。
警告:发生意外的技术或业务事件,客户可能会受到影响,但可能不需要立即人工干预。待命人员不会立即接到电话,但支持人员会希望尽快审查这些问题以了解影响是什么。基本上任何需要跟踪但可能不需要立即干预的问题。
信息:我们希望大量查看的信息,以便我们需要对问题进行取证分析。系统生命周期事件(系统启动、停止)位于此处。 “会话”生命周期事件(登录、注销等)位于此处。还应考虑重要的边界事件(例如数据库调用、远程 API 调用)。典型的业务异常可以放在此处(例如,由于凭据错误而导致登录失败)。您认为需要在生产中大量查看的任何其他事件都位于此处。
调试:几乎所有不影响“信息”的内容...任何有助于跟踪系统流程和隔离问题的消息,尤其是在开发和质量检查期间阶段。我们使用“调试”级别日志来进入/退出大多数重要方法,并在方法内标记有趣的事件和决策点。
trace:我们不经常使用此功能,但这适用于极其详细且可能大量的日志,即使在正常开发期间,您通常也不希望启用这些日志。示例包括转储完整的对象层次结构、在大循环的每次迭代期间记录某些状态等。
与选择正确的日志级别一样或更重要的是确保日志有意义并具有所需的上下文。例如,您几乎总是希望在日志中包含线程 ID,以便在需要时可以跟踪单个线程。您可能还想采用一种机制将业务信息(例如用户 ID)与线程关联起来,以便它也被记录下来。在日志消息中,您需要包含足够的信息以确保该消息可操作。像“ FileNotFound 异常捕获”这样的日志并不是很有帮助。更好的消息是“尝试打开配置文件时捕获 FileNotFound 异常:/usr/local/app/somefile.txt。userId=12344。”
还有许多很好的日志记录指南...例如,这是来自 JCL(雅加达共享日志记录):
I mostly build large scale, high availability type systems, so my answer is biased towards looking at it from a production support standpoint; that said, we assign roughly as follows:
error: the system is in distress, customers are probably being affected (or will soon be) and the fix probably requires human intervention. The "2AM rule" applies here- if you're on call, do you want to be woken up at 2AM if this condition happens? If yes, then log it as "error".
warn: an unexpected technical or business event happened, customers may be affected, but probably no immediate human intervention is required. On call people won't be called immediately, but support personnel will want to review these issues asap to understand what the impact is. Basically any issue that needs to be tracked but may not require immediate intervention.
info: things we want to see at high volume in case we need to forensically analyze an issue. System lifecycle events (system start, stop) go here. "Session" lifecycle events (login, logout, etc.) go here. Significant boundary events should be considered as well (e.g. database calls, remote API calls). Typical business exceptions can go here (e.g. login failed due to bad credentials). Any other event you think you'll need to see in production at high volume goes here.
debug: just about everything that doesn't make the "info" cut... any message that is helpful in tracking the flow through the system and isolating issues, especially during the development and QA phases. We use "debug" level logs for entry/exit of most non-trivial methods and marking interesting events and decision points inside methods.
trace: we don't use this often, but this would be for extremely detailed and potentially high volume logs that you don't typically want enabled even during normal development. Examples include dumping a full object hierarchy, logging some state during every iteration of a large loop, etc.
As or more important than choosing the right log levels is ensuring that the logs are meaningful and have the needed context. For example, you'll almost always want to include the thread ID in the logs so you can follow a single thread if needed. You may also want to employ a mechanism to associate business info (e.g. user ID) to the thread so it gets logged as well. In your log message, you'll want to include enough info to ensure the message can be actionable. A log like " FileNotFound exception caught" is not very helpful. A better message is "FileNotFound exception caught while attempting to open config file: /usr/local/app/somefile.txt. userId=12344."
There are also a number of good logging guides out there... for example, here's an edited snippet from JCL (Jakarta Commons Logging):
我认为更多的是从开发而不是运营的角度来看,我的方法是:
My approach, i think coming more from an development than an operations point of view, is:
这也可能有切向帮助,了解特定级别的日志记录请求(来自代码)是否会导致在给定部署的有效日志记录级别的情况下实际记录它配置有.从此处的其他答案中确定您想要配置部署的有效级别,然后参考此内容以查看是否会实际记录代码中的特定日志记录请求那么...
例如:
来自 logback 文档:
因此,请求日志记录的代码行只有在其部署的有效日志记录级别小于或等于该代码行的请求时才会实际记录> 严重程度。
This may also tangentially help, to understand if a logging request (from the code) at a certain level will result in it actually being logged given the effective logging level that a deployment is configured with. Decide what effective level you want to configure you deployment with from the other Answers here, and then refer to this to see if a particular logging request from your code will actually be logged then...
For examples:
from logback documentation:
So a code line that requests logging will only actually get logged if the effective logging level of its deployment is less than or equal to that code line's requested level of severity.
我从基于组件的架构中回答这个问题,在该架构中,组织可能运行许多相互依赖的组件。在传播故障期间,日志记录级别应有助于识别哪些组件受到影响以及哪些组件是根本原因。
错误 - 该组件发生故障,原因被认为是内部的(任何内部的、未处理的异常、封装依赖项的故障...例如数据库、REST 示例将是它已收到来自依赖项的 4xx 错误)。让我(该组件的维护者)起床。
警告 - 此组件出现故障,据信是由依赖组件引起的(REST 示例是依赖项的 5xx 状态)。让该组件的维护者起床。
信息 - 我们想要向操作员提供的任何其他信息。如果您决定记录快乐路径,那么我建议将每个重要操作(例如每个传入的 http 请求)限制为 1 条日志消息。
对于所有日志消息,请务必记录有用的上下文(并优先考虑使消息易于阅读/有用,而不是包含大量“错误代码”)
可视化上述日志记录级别的一个好方法是想象每个组件的一组监视屏幕。当一切运行良好时,它们呈绿色,如果组件记录警告,则它将变为橙色(琥珀色),如果任何组件记录错误,则它将变为红色。
如果发生事件,您应该让一个(根本原因)组件变为红色,所有受影响的组件应变为橙色/琥珀色。
I answer this coming from a component-based architecture, where an organisation may be running many components that may rely on each other. During a propagating failure, logging levels should help to identify both which components are affected and which are a root cause.
ERROR - This component has had a failure and the cause is believed to be internal (any internal, unhandled exception, failure of encapsulated dependency... e.g. database, REST example would be it has received a 4xx error from a dependency). Get me (maintainer of this component) out of bed.
WARN - This component has had a failure believed to be caused by a dependent component (REST example would be a 5xx status from a dependency). Get the maintainers of THAT component out of bed.
INFO - Anything else that we want to get to an operator. If you decide to log happy paths then I recommend limiting to 1 log message per significant operation (e.g. per incoming http request).
For all log messages be sure to log useful context (and prioritise on making messages human readable/useful rather than having reams of "error codes")
A nice way to visualise the above logging levels is to imagine a set of monitoring screens for each component. When all running well they are green, if a component logs a WARNING then it will go orange (amber) if anything logs an ERROR then it will go red.
In the event of an incident you should have one (root cause) component go red and all the affected components should go orange/amber.
其他答案没有什么不同,我的框架具有几乎相同的级别:
Not different for other answers, my framework have almost the same levels: