格式化日志的最佳实践是什么?

发布于 2024-08-12 01:19:53 字数 268 浏览 1 评论 0原文

我正在编写一个蜜罐软件,它将对其交互进行广泛的记录,我计划记录纯文本 .log 文件。

我有两个问题,来自不太熟悉服务器日志方式的人。

  1. 首先,我该如何分解日志文件,我假设运行一个月后我不需要一个大的 .log 文件,我是否按天、按月执行此操作?年?有一些标准吗?

  2. 每一行的格式,是否有一个标准分隔符,*、-、+ 等等?有没有一个标准(我的谷歌搜索没有找到太多)?

I'm writing a piece of honeypot software that will have extensive logging of interactions with it, I plan to log in plaintext .log files.

I have two questions, from someone who isn't too familiar with how servers log.

  1. Firstly how shall I break up my log files, I'm assuming after running this for a month I don't want one big .log file, do I do this by day, month, year? Is there some standard for it?

  2. The format of each line, do I have one standard delimiter that is whatever, *, -, +, anything? Is there a standard anywhere (my googling hasn't brought up much)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

像极了他 2024-08-19 01:19:53

我喜欢这种日志文件格式:

$ python simple_logging_module.py
2005-03-19 15:10:26,618 - simple_example - DEBUG - debug message
2005-03-19 15:10:26,620 - simple_example - INFO - info message
2005-03-19 15:10:26,695 - simple_example - WARNING - warn message
2005-03-19 15:10:26,697 - simple_example - ERROR - error message
2005-03-19 15:10:26,773 - simple_example - CRITICAL - critical message

这是来自 python 的 日志记录模块
我通常每天一个文件,每月一个文件夹,每年一个文件夹。您将获得巨大的日志文件,否则您将无法正确编辑这些文件。

logs/
  2009/
    January/
     01012009.log
     02012009.log
     ...
    February/
     ...
  2008/
   ...

I like this format for log files:

$ python simple_logging_module.py
2005-03-19 15:10:26,618 - simple_example - DEBUG - debug message
2005-03-19 15:10:26,620 - simple_example - INFO - info message
2005-03-19 15:10:26,695 - simple_example - WARNING - warn message
2005-03-19 15:10:26,697 - simple_example - ERROR - error message
2005-03-19 15:10:26,773 - simple_example - CRITICAL - critical message

This is from python's logging module.
I usually have a file per day, one folder for each month, one folder for each year. You'll get huge log files that you can't edit properly otherwise.

logs/
  2009/
    January/
     01012009.log
     02012009.log
     ...
    February/
     ...
  2008/
   ...
财迷小姐 2024-08-19 01:19:53

此类日志记录没有标准。以及文件的滚动、布局,这一切都取决于您的需要。一般来说,我遇到过 3 个主要场景:

  • 全部在一个文件中。似乎不适合你。
  • 固定尺寸​​滚动。一旦当前文件大于定义值,您就可以在创建新日志文件时定义大小。通常,大多数 log4anything 软件包都对此提供开箱即用的支持。
  • 完全自定义滚动。我见过这样的布局
    • 每天都有自己的目录,以YYYYMMDD的格式命名。如果您不暂存日志,请考虑目录布局,如其他答案中所示的 YYYY\MM\YYYYMMDD。
    • 在此目录内应使用固定大小滚动
    • 每个文件都有一个名称 logfile_yyyymmdd_ccc.log,其中 ccc 是递增的数字。在文件名中添加时间也是一个好主意(例如,轻松判断每分钟生成多少日志)
    • 为了节省空间,每个日志都会自动使用 zip 进行压缩。
    • 过去 3 天始终保持未压缩状态,以便您可以使用 UNIX 文本工具快速访问。

这个自定义的看起来像这样

logs/
  20090101/
     logfile_20090101_001.zip
     logfile_20090101_002.zip
     ...
  20090102/
     logfile_20090102_001.zip
     logfile_20090102_002.zip
   logfile_20090101_001.log
   logfile_20090101_002.log
   logfile_20090102_001.log
   logfile_20090102_002.log

对于良好的日志记录,还有一些好的做法:

  • 始终 在日志文件名中保留日期
  • 始终 在日志文件名中添加一些名称。它将帮助您将来区分来自系统不同实例的日志文件。
  • 始终记录每个日志事件的时间和日期(最好达到毫秒分辨率)。
  • 始终将您的日期存储为 YYYYMMDD。到处。在文件名中,日志文件内部。它对排序有很大帮助。允许使用某些分隔符(例如 2009-11-29)。
  • 一般来说,避免将日志存储在数据库中。 In 是日志模式中的另一个失败点。
  • 如果您有多线程系统,请始终记录线程 ID。
  • 如果您有多进程系统,请始终记录进程 ID。
  • 如果您有很多计算机,请始终记录计算机 ID。
  • 确保您可以稍后处理日志。只需尝试将一个日志文件导入数据库或 Excel 即可。如果时间超过 30 秒,则表示您的记录有误。这包括:
    • 选择良好的内部日志格式。我更喜欢空格分隔,因为它与 Unix 文本工具和 Excel 配合得很好。
    • 选择良好的日期/时间格式,以便您可以轻松导入到某些 SQL 数据库或 Excel 中以进行进一步处理。

There is no standard for such a logging. And rolling, layout of files, it all depends on what you need. In general I have faced 3 main scenarios:

  • All in one file. Seems not an option for you.
  • Fixed size rolling. You define size when new log file is created once current file is bigger than defined value. Usually there is support out of a box for this in most log4anything packages.
  • Total custom rolling. I've seen layouts like this
    • Every day gets it's own directory named in format of YYYYMMDD. If you don't stage your logs consider directory layout like YYYY\MM\YYYYMMDD as shown in other answers.
    • Inside this directory fixed size rolling should be used.
    • Every file has name logfile_yyyymmdd_ccc.log where ccc is increasing number. Adding time to file name is also a good idea (eg. to easily judge how many logs per minute you are generating)
    • To save space every log is compressed with zip automatically.
    • Last 3 days are allways kept uncompressed so you can have a quick access with UNIX text tools.

This custom one looked like this

logs/
  20090101/
     logfile_20090101_001.zip
     logfile_20090101_002.zip
     ...
  20090102/
     logfile_20090102_001.zip
     logfile_20090102_002.zip
   logfile_20090101_001.log
   logfile_20090101_002.log
   logfile_20090102_001.log
   logfile_20090102_002.log

There is also some bunch of good practices for good logging:

  • Always keep date in your log file name
  • Always add some name to your log file name. It will help you in the future to distinguish log files from different instances of your system.
  • Always log time and date (preferably up to milliseconds resolution) for every log event.
  • Always store your date as YYYYMMDD. Everywhere. In filename, inside of logfile. It greatly helps with sorting. Some separators are allowed (eg. 2009-11-29).
  • In general avoid storing logs in database. In is another point of failure in your logging schema.
  • If you have multithreaded system always log thread id.
  • If you have multi process system always log process id.
  • If you have many computers always log computer id.
  • Make sure you can process logs later. Just try importing one log file into database or Excel. If it takes longer than 30 seconds it means your logging is wrong. This includes:
    • Choosing good internal format of logging. I prefer space delimeted since it works nice with Unix text tools and with Excel.
    • Choosing good format for date/time so you can easily import into some SQL databse or Excel for further proccesing.
厌倦 2024-08-19 01:19:53

要分解日志文件,您可以使用 logrotate 之类的外部应用程序并让它处理的肮脏工作。

至于每一行的格式,没有标准,所以你应该使用最适合你的格式。如果您稍后要自动解析日志文件,那么您在格式化日志输出时可能需要记住这一点。

To break up your log files, you could use an external application like logrotate and let it take care of the dirty work.

As for the format of each line, there's no standard, so you should use what works best for you. If you're going to automatically parse the log file later, then you might want to keep that in mind as you format the log output.

佼人 2024-08-19 01:19:53

我建议您使用知名的日志库。大多数日志库都支持滚动。 Log4Net (.net) / Log4J (java) 是一个特别好用的日志库,它有很多您可能会觉得有用的选项。使用最适合您的翻转间隔。对于蜜罐应用程序,我认为您会发现每小时或每天的营业额最有效。您还可以使用固定限制,例如 256mb,以确保您的日志工作不会超出可用的可用磁盘空间。 Log4Net/Log4J 也支持这一点。

Log4J @ Apache.Org
Log4Net @ Apache.Org

日志文件的格式应根据您的需要进行设置。非常希望使用不太可能出现在日志输入中的分隔符。对于您的应用程序来说,这可能是不可能的。在典型情况下,一些方使用空格(NCSA 日志),一些方使用逗号(创建 CSV 文件),一些方使用制表符(创建制表符分隔文件)。其中每一个都有自己的优点和缺点。

I recommend you use a well-known logging library. Most logging libraries support rollover for you. Log4Net (.net) / Log4J (java) is a particularly good logging library to use, and it has a lot of options that you may find useful. Use whatever rollover interval works best for you. For a honeypot application, I think you will find hourly or daily turnover to work best. You could also use a fixed limit, like 256mb, to ensure that your log efforts don't overrun the available free disk space. Log4Net/Log4J supports this as well.

Log4J @ Apache.Org
Log4Net @ Apache.Org

The format of your logfiles should be setup according to your needs. It is highly desirable to use a delimiter that is unlikely to show up in your log input. For your application, this may not be possible. Under typical circumstances, some parties use spaces (NCSA logs), some parties use commas (to make CSV files), some parties use tabs (to make tab-delimited files). Each of these has their own benefits and drawbacks.

烂人 2024-08-19 01:19:53

今天(2022 年)我们喜欢结构化日志记录和日志索引器(例如 Elasticsearch 或 Loki)。所以我们必须登录 NDJson (新行分隔 Json)。

因为日志传送代理在日志文件轮换方面遇到困难(谁喜欢丢失日志事件??),我们避免轮换(不再有 logrotate!!):相反,我们使用日期模式或自动递增序列命名文件,并且定义删除过期文件的策略。

不要使用单个文件!使用集中式日志搜索引擎

https://en.wikipedia.org/wiki/Common_Log_Format 已成为过去。

Today (2022) we love structured logging and log indexers (like Elasticsearch or Loki). So we have to log in NDJson (new line delimetted Json).

Because log delivery agents have difficulty with log file rotation (who loves missing log events??) we avoid rotation (no more logrotate!!): instead we name files using date pattern or auto-incremented sequence and define policy to remove outdated files.

Don't work with individual files! Use centralized log search engines!!

https://en.wikipedia.org/wiki/Common_Log_Format is in the past.

孤檠 2024-08-19 01:19:53

建议:

对于蜜罐系统(除非坏人真的在攻击应用程序/站点),您可以考虑花额外的时间登录数据库

这将使日志的分析和使用变得更容易,并且是实时的(即,在分析/浏览日志之前,您不需要经历 ETL 过程。

这就是说在数据库表或文件中( s),这并不排除定义格式的需要。暂时,您可以使用“多态”格式,其中包含一些常见属性(ID、IP 地址)。 、时间戳、Cookie/ID、“级别”[重要/紧急]),后跟定义特定事件类型的简短助记符代码(例如“LIA” = 登录尝试、“GURL” = 猜测的 url、“SQLI” SQL 注入尝试)等等...)后面跟着一些数字字段和一些字符串字段,它们的语义会根据助记符而变化

 - Id
 - TimeStamp  (maybe split in date and time)
 - IP_Address
 - UserID_of_sorts
 - // other generic/common fields that you may think of
 - EventCode   (LIA, GURL, SQLI...)
 - Message   Text message (varies with particular event instance)
 - Int1      // Numbers...
 - Int2
 - Str1      // ...and text which meaning varies with the EventCode
 - Str2
 - //... ?

。去数据库),你可以/应该使用标准日志库,也许log4j如其他回复中所建议的那样(尽管我不确定它是否容易在Python中绑定) ,无论如何,Python 的标准日志记录模块是 +/- 相同的...)甚至是 Python 标准库的日志记录模块 可能可以根据您的需求进行定制。

A suggestion:

It being for a honeypot system (and unless the baddies are really whacking the application/site), you may consider taking the extra time to log to a database instead.

This will make the analysis and usage of the logs easier, and real-time (i.e. you do not need to go through the ETL process prior to analyzing / browsing the logs.

This said being in a DB table(s) or in file(s), this doesn't preclude the need to define a format. Tentatively, you can have a "polymorphic" format, with a few common attributes (ID, IP address, Timestamp, Cookie/ID, "level" [of importance/urgency]) followed by a short mnemonic code defining a particular event type (say "LIA" = login attempt, "GURL" = guessed url, "SQLI" SQL Injection attempt etc...) followed by a few numeric fields, and a few string fields which semantics will vary as per the mnemonic. To summarize:

 - Id
 - TimeStamp  (maybe split in date and time)
 - IP_Address
 - UserID_of_sorts
 - // other generic/common fields that you may think of
 - EventCode   (LIA, GURL, SQLI...)
 - Message   Text message (varies with particular event instance)
 - Int1      // Numbers...
 - Int2
 - Str1      // ...and text which meaning varies with the EventCode
 - Str2
 - //... ?

Now... regardless of this going to a flat file or to SQL database (and maybe particularly if going to DB), you could/should use a standard logging library. Maybe log4j as suggested in other replies (although I'm not sure if it readily has bindings in Python, and anyway, the Python's standard logging module is +/- the same...) or even the Python's standard library's logging module can probably be tailored for your needs.

╭ゆ眷念 2024-08-19 01:19:53

在我看来,最重要的是:

In my opinion, the most important is:

晨曦慕雪 2024-08-19 01:19:53

日志文件

  1. 使日志条目在上下文中有意义
  2. 使用标准日期和时间格式
  3. 使用本地时间 + 时间戳偏移量
  4. 使用日志记录级别
  5. 根据其粒度正确地将日志记录划分到不同的目标 记录异常
  6. 时包括堆栈跟踪
  7. 包括异常的名称从多线程应用程序记录时的线程

Log File

  1. Make Log Entries Meaningful With Context
  2. Use a Standard Date and Time Format
  3. Use Local Time + Offset for Your Timestamps
  4. Use Logging Levels Correctly
  5. Split Your Logging to Different Targets Based on Their Granularity
  6. Include the Stack Trace When Logging an Exception
  7. Include the Name of the Thread When Logging From a Multi-Threaded Application
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文