日志记录级别 - Logback - 分配日志级别的经验法则

发布于 2024-12-10 18:41:26 字数 267 浏览 0 评论 0原文

我在当前项目中使用 logback

它提供了六个级别的日志记录: TRACE DEBUG INFO WARN ERROR OFF

我正在寻找一个经验法则来确定常见活动的日志级别。 例如,如果线程被锁定,日志消息应该设置为调试级别还是信息级别。 或者,如果正在使用套接字,则应在调试级别或跟踪级别记录其特定 ID。

我将感谢为每个日志记录级别提供更多示例的答案。

I'm using logback in my current project.

It offers six levels of logging: TRACE DEBUG INFO WARN ERROR OFF

I'm looking for a rule of thumb to determine the log level for common activities.
For instance, if a thread is locked, should the log message be set to the debug level or the info level.
Or if a socket is being used, should its specific id be logged at the debug level or the trace level.

I will appreciate answers with more examples for each logging level.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

琉璃繁缕 2024-12-17 18:41:26

我主要构建大规模、高可用性类型的系统,所以我的答案偏向于从生产支持的角度来看;也就是说,我们大致分配如下:

  • 错误:系统陷入困境,客户可能受到影响(或很快就会受到影响),修复可能需要人工干预。 “凌晨 2 点规则”在这里适用 - 如果您正在值班,如果发生这种情况,您是否希望在凌晨 2 点被叫醒?如果是,则将其记录为“错误”。

  • 警告:发生意外的技术或业务事件,客户可能会受到影响,但可能不需要立即人工干预。待命人员不会立即接到电话,但支持人员会希望尽快审查这些问题以了解影响是什么。基本上任何需要跟踪但可能不需要立即干预的问题。

  • 信息:我们希望大量查看的信息,以便我们需要对问题进行取证分析。系统生命周期事件(系统启动、停止)位于此处。 “会话”生命周期事件(登录、注销等)位于此处。还应考虑重要的边界事件(例如数据库调用、远程 API 调用)。典型的业务异常可以放在此处(例如,由于凭据错误而导致登录失败)。您认为需要在生产中大量查看的任何其他事件都位于此处。

  • 调试:几乎所有不影响“信息”的内容...任何有助于跟踪系统流程和隔离问题的消息,尤其是在开发和质量检查期间阶段。我们使用“调试”级别日志来进入/退出大多数重要方法,并在方法内标记有趣的事件和决策点。

  • trace:我们不经常使用此功能,但这适用于极其详细且可能大量的日志,即使在正常开发期间,您通常也不希望启用这些日志。示例包括转储完整的对象层次结构、在大循环的每次迭代期间记录某些状态等。

与选择正确的日志级别一样或更重要的是确保日志有意义并具有所需的上下文。例如,您几乎总是希望在日志中包含线程 ID,以便在需要时可以跟踪单个线程。您可能还想采用一种机制将业务信息(例如用户 ID)与线程关联起来,以便它也被记录下来。在日志消息中,您需要包含足够的信息以确保该消息可操作。像“ FileNotFound 异常捕获”这样的日志并不是很有帮助。更好的消息是“尝试打开配置文件时捕获 FileNotFound 异常:/usr/local/app/somefile.txt。userId=12344。”

还有许多很好的日志记录指南...例如,这是来自 JCL(雅加达共享日志记录)

  • 错误 - 其他运行时错误或意外情况。预计这些将立即在状态控制台上可见。
  • 警告 - 使用已弃用的 API、API 使用不当、“几乎”错误、其他不希望或意外的运行时情况,但不是
    必然是“错误”的。预计这些将立即在
    状态控制台。
  • info - 有趣的运行时事件(启动/关闭)。预计这些会立即在控制台上可见,因此要保守并坚持
    最低限度。
  • 调试 - 有关系统流程的详细信息。期望这些仅写入日志。
  • trace - 更详细的信息。希望这些仅写入日志。

I mostly build large scale, high availability type systems, so my answer is biased towards looking at it from a production support standpoint; that said, we assign roughly as follows:

  • error: the system is in distress, customers are probably being affected (or will soon be) and the fix probably requires human intervention. The "2AM rule" applies here- if you're on call, do you want to be woken up at 2AM if this condition happens? If yes, then log it as "error".

  • warn: an unexpected technical or business event happened, customers may be affected, but probably no immediate human intervention is required. On call people won't be called immediately, but support personnel will want to review these issues asap to understand what the impact is. Basically any issue that needs to be tracked but may not require immediate intervention.

  • info: things we want to see at high volume in case we need to forensically analyze an issue. System lifecycle events (system start, stop) go here. "Session" lifecycle events (login, logout, etc.) go here. Significant boundary events should be considered as well (e.g. database calls, remote API calls). Typical business exceptions can go here (e.g. login failed due to bad credentials). Any other event you think you'll need to see in production at high volume goes here.

  • debug: just about everything that doesn't make the "info" cut... any message that is helpful in tracking the flow through the system and isolating issues, especially during the development and QA phases. We use "debug" level logs for entry/exit of most non-trivial methods and marking interesting events and decision points inside methods.

  • trace: we don't use this often, but this would be for extremely detailed and potentially high volume logs that you don't typically want enabled even during normal development. Examples include dumping a full object hierarchy, logging some state during every iteration of a large loop, etc.

As or more important than choosing the right log levels is ensuring that the logs are meaningful and have the needed context. For example, you'll almost always want to include the thread ID in the logs so you can follow a single thread if needed. You may also want to employ a mechanism to associate business info (e.g. user ID) to the thread so it gets logged as well. In your log message, you'll want to include enough info to ensure the message can be actionable. A log like " FileNotFound exception caught" is not very helpful. A better message is "FileNotFound exception caught while attempting to open config file: /usr/local/app/somefile.txt. userId=12344."

There are also a number of good logging guides out there... for example, here's an edited snippet from JCL (Jakarta Commons Logging):

  • error - Other runtime errors or unexpected conditions. Expect these to be immediately visible on a status console.
  • warn - Use of deprecated APIs, poor use of API, 'almost' errors, other runtime situations that are undesirable or unexpected, but not
    necessarily "wrong". Expect these to be immediately visible on a
    status console.
  • info - Interesting runtime events (startup/shutdown). Expect these to be immediately visible on a console, so be conservative and keep to
    a minimum.
  • debug - detailed information on the flow through the system. Expect these to be written to logs only.
  • trace - more detailed information. Expect these to be written to logs only.
难以启齿的温柔 2024-12-17 18:41:26

我认为更多的是从开发而不是运营的角度来看,我的方法是:

  • 错误意味着某些任务的执行无法完成;无法发送电子邮件、无法呈现页面、无法将某些数据存储到数据库等。肯定出了什么问题。
  • 警告意味着发生了意外的事情,但执行可以继续,也许是在降级模式下;配置文件丢失,但使用了默认值,价格计算为负数,因此被限制为零,等等。有些事情不对劲,但还没有完全出错 - 警告通常是一个迹象,表明将会出现很快就会出现错误。
  • 信息表示发生了正常但重要的事情;系统启动、系统停止、每日库存更新作业运行等等。不应该有连续的这些内容,否则有太多东西需要阅读。
  • 调试表示发生了一些正常且无关紧要的事情;新用户来到该网站,呈现了一个页面,接受了订单,更新了价格。这是从信息中排除的内容,因为信息太多了。
  • Trace是我从未实际使用过的东西。

My approach, i think coming more from an development than an operations point of view, is:

  • Error means that the execution of some task could not be completed; an email couldn't be sent, a page couldn't be rendered, some data couldn't be stored to a database, something like that. Something has definitively gone wrong.
  • Warning means that something unexpected happened, but that execution can continue, perhaps in a degraded mode; a configuration file was missing but defaults were used, a price was calculated as negative, so it was clamped to zero, etc. Something is not right, but it hasn't gone properly wrong yet - warnings are often a sign that there will be an error very soon.
  • Info means that something normal but significant happened; the system started, the system stopped, the daily inventory update job ran, etc. There shouldn't be a continual torrent of these, otherwise there's just too much to read.
  • Debug means that something normal and insignificant happened; a new user came to the site, a page was rendered, an order was taken, a price was updated. This is the stuff excluded from info because there would be too much of it.
  • Trace is something i have never actually used.
背叛残局 2024-12-17 18:41:26

这也可能有切向帮助,了解特定级别的日志记录请求(来自代码)是否会导致在给定部署的有效日志记录级别的情况下实际记录它配置有.从此处的其他答案中确定您想要配置部署的有效级别,然后参考此内容以查看是否会实际记录代码中的特定日志记录请求那么...

例如

  • “在 WARN 处记录的日志代码行实际上会记录在配置了 ERROR 的部署中吗?”表格上说,不。
  • “在 WARN 处记录的日志代码行实际上会记录在配置了 DEBUG 的部署中吗?”表上说,是的。

来自 logback 文档

以更形象的方式,以下是选择规则的工作原理。在下表中,垂直标头显示记录请求的级别,用 p 指定,而水平标头显示记录器的有效级别,用 q 指定。行(级别请求)和列(有效级别)的交集是由基本选择规则产生的布尔值。
输入图像描述这里

因此,请求日志记录的代码行只有在其部署的有效日志记录级别小于或等于该代码行的请求时才会实际记录> 严重程度。

This may also tangentially help, to understand if a logging request (from the code) at a certain level will result in it actually being logged given the effective logging level that a deployment is configured with. Decide what effective level you want to configure you deployment with from the other Answers here, and then refer to this to see if a particular logging request from your code will actually be logged then...

For examples:

  • "Will a logging code line that logs at WARN actually get logged on my deployment configured with ERROR?" The table says, NO.
  • "Will a logging code line that logs at WARN actually get logged on my deployment configured with DEBUG?" The table says, YES.

from logback documentation:

In a more graphic way, here is how the selection rule works. In the following table, the vertical header shows the level of the logging request, designated by p, while the horizontal header shows effective level of the logger, designated by q. The intersection of the rows (level request) and columns (effective level) is the boolean resulting from the basic selection rule.
enter image description here

So a code line that requests logging will only actually get logged if the effective logging level of its deployment is less than or equal to that code line's requested level of severity.

秋意浓 2024-12-17 18:41:26

我从基于组件的架构中回答这个问题,在该架构中,组织可能运行许多相互依赖的组件。在传播故障期间,日志记录级别应有助于识别哪些组件受到影响以及哪些组件是根本原因。

  • 错误 - 该组件发生故障,原因被认为是内部的(任何内部的、未处理的异常、封装依赖项的故障...例如数据库、REST 示例将是它已收到来自依赖项的 4xx 错误)。让我(该组件的维护者)起床。

  • 警告 - 此组件出现故障,据信是由依赖组件引起的(REST 示例是依赖项的 5xx 状态)。让该组件的维护者起床。

  • 信息 - 我们想要向操作员提供的任何其他信息。如果您决定记录快乐路径,那么我建议将每个重要操作(例如每个传入的 http 请求)限制为 1 条日志消息。

对于所有日志消息,请务必记录有用的上下文(并优先考虑使消息易于阅读/有用,而不是包含大量“错误代码”)

  • DEBUG(及以下) - 根本不应该使用(当然不在生产中)。在开发过程中,我建议结合使用 TDD 和调试(如有必要),而不是使用日志语句污染代码。在生产中,上述 INFO 日志记录与其他指标相结合应该足够了。

可视化上述日志记录级别的一个好方法是想象每个组件的一组监视屏幕。当一切运行良好时,它们呈绿色,如果组件记录警告,则它将变为橙色(琥珀色),如果任何组件记录错误,则它将变为红色。

如果发生事件,您应该让一个(根本原因)组件变为红色,所有受影响的组件应变为橙色/琥珀色。

I answer this coming from a component-based architecture, where an organisation may be running many components that may rely on each other. During a propagating failure, logging levels should help to identify both which components are affected and which are a root cause.

  • ERROR - This component has had a failure and the cause is believed to be internal (any internal, unhandled exception, failure of encapsulated dependency... e.g. database, REST example would be it has received a 4xx error from a dependency). Get me (maintainer of this component) out of bed.

  • WARN - This component has had a failure believed to be caused by a dependent component (REST example would be a 5xx status from a dependency). Get the maintainers of THAT component out of bed.

  • INFO - Anything else that we want to get to an operator. If you decide to log happy paths then I recommend limiting to 1 log message per significant operation (e.g. per incoming http request).

For all log messages be sure to log useful context (and prioritise on making messages human readable/useful rather than having reams of "error codes")

  • DEBUG (and below) - Shouldn't be used at all (and certainly not in production). In development I would advise using a combination of TDD and Debugging (where necessary) as opposed to polluting code with log statements. In production, the above INFO logging, combined with other metrics should be sufficient.

A nice way to visualise the above logging levels is to imagine a set of monitoring screens for each component. When all running well they are green, if a component logs a WARNING then it will go orange (amber) if anything logs an ERROR then it will go red.

In the event of an incident you should have one (root cause) component go red and all the affected components should go orange/amber.

意中人 2024-12-17 18:41:26

其他答案没有什么不同,我的框架具有几乎相同的级别:

  1. 错误:应用程序上的严重逻辑错误,例如数据库连接超时。需要在不久的将来修复错误的事情
  2. 警告:不是破坏性问题,而是需要注意的事情。例如未找到请求的页面
  3. 信息:在函数/方法第一行中使用,显示已调用的过程或步骤正常,例如插入查询完成
  4. 日志:逻辑信息,例如 if 语句的结果
  5. 调试:变量相关内容可永久观看

Not different for other answers, my framework have almost the same levels:

  1. Error: critical logical errors on application, like a database connection timeout. Things that call for a bug-fix in near future
  2. Warn: not-breaking issues, but stuff to pay attention for. Like a requested page not found
  3. Info: used in functions/methods first line, to show a procedure that has been called or a step gone ok, like a insert query done
  4. log: logic information, like a result of an if statement
  5. debug: variable contents relevant to be watched permanently
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文