格式化日志的最佳实践是什么?
我正在编写一个蜜罐软件,它将对其交互进行广泛的记录,我计划记录纯文本 .log
文件。
我有两个问题,来自不太熟悉服务器日志方式的人。
首先,我该如何分解日志文件,我假设运行一个月后我不需要一个大的
.log
文件,我是否按天、按月执行此操作?年?有一些标准吗?每一行的格式,是否有一个标准分隔符,*、-、+ 等等?有没有一个标准(我的谷歌搜索没有找到太多)?
I'm writing a piece of honeypot software that will have extensive logging of interactions with it, I plan to log in plaintext .log
files.
I have two questions, from someone who isn't too familiar with how servers log.
Firstly how shall I break up my log files, I'm assuming after running this for a month I don't want one big
.log
file, do I do this by day, month, year? Is there some standard for it?The format of each line, do I have one standard delimiter that is whatever, *, -, +, anything? Is there a standard anywhere (my googling hasn't brought up much)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
我喜欢这种日志文件格式:
这是来自 python 的 日志记录模块。
我通常每天一个文件,每月一个文件夹,每年一个文件夹。您将获得巨大的日志文件,否则您将无法正确编辑这些文件。
I like this format for log files:
This is from python's logging module.
I usually have a file per day, one folder for each month, one folder for each year. You'll get huge log files that you can't edit properly otherwise.
此类日志记录没有标准。以及文件的滚动、布局,这一切都取决于您的需要。一般来说,我遇到过 3 个主要场景:
log4anything
软件包都对此提供开箱即用的支持。YYYYMMDD
的格式命名。如果您不暂存日志,请考虑目录布局,如其他答案中所示的 YYYY\MM\YYYYMMDD。logfile_yyyymmdd_ccc.log
,其中ccc
是递增的数字。在文件名中添加时间也是一个好主意(例如,轻松判断每分钟生成多少日志)UNIX
文本工具快速访问。这个自定义的看起来像这样
对于良好的日志记录,还有一些好的做法:
Excel
即可。如果时间超过 30 秒,则表示您的记录有误。这包括:Unix
文本工具和Excel
配合得很好。There is no standard for such a logging. And rolling, layout of files, it all depends on what you need. In general I have faced 3 main scenarios:
log4anything
packages.YYYYMMDD
. If you don't stage your logs consider directory layout like YYYY\MM\YYYYMMDD as shown in other answers.logfile_yyyymmdd_ccc.log
whereccc
is increasing number. Adding time to file name is also a good idea (eg. to easily judge how many logs per minute you are generating)UNIX
text tools.This custom one looked like this
There is also some bunch of good practices for good logging:
Excel
. If it takes longer than 30 seconds it means your logging is wrong. This includes:Unix
text tools and withExcel
.要分解日志文件,您可以使用 logrotate 之类的外部应用程序并让它处理的肮脏工作。
至于每一行的格式,没有标准,所以你应该使用最适合你的格式。如果您稍后要自动解析日志文件,那么您在格式化日志输出时可能需要记住这一点。
To break up your log files, you could use an external application like logrotate and let it take care of the dirty work.
As for the format of each line, there's no standard, so you should use what works best for you. If you're going to automatically parse the log file later, then you might want to keep that in mind as you format the log output.
我建议您使用知名的日志库。大多数日志库都支持滚动。 Log4Net (.net) / Log4J (java) 是一个特别好用的日志库,它有很多您可能会觉得有用的选项。使用最适合您的翻转间隔。对于蜜罐应用程序,我认为您会发现每小时或每天的营业额最有效。您还可以使用固定限制,例如 256mb,以确保您的日志工作不会超出可用的可用磁盘空间。 Log4Net/Log4J 也支持这一点。
Log4J @ Apache.Org
Log4Net @ Apache.Org
日志文件的格式应根据您的需要进行设置。非常希望使用不太可能出现在日志输入中的分隔符。对于您的应用程序来说,这可能是不可能的。在典型情况下,一些方使用空格(NCSA 日志),一些方使用逗号(创建 CSV 文件),一些方使用制表符(创建制表符分隔文件)。其中每一个都有自己的优点和缺点。
I recommend you use a well-known logging library. Most logging libraries support rollover for you. Log4Net (.net) / Log4J (java) is a particularly good logging library to use, and it has a lot of options that you may find useful. Use whatever rollover interval works best for you. For a honeypot application, I think you will find hourly or daily turnover to work best. You could also use a fixed limit, like 256mb, to ensure that your log efforts don't overrun the available free disk space. Log4Net/Log4J supports this as well.
Log4J @ Apache.Org
Log4Net @ Apache.Org
The format of your logfiles should be setup according to your needs. It is highly desirable to use a delimiter that is unlikely to show up in your log input. For your application, this may not be possible. Under typical circumstances, some parties use spaces (NCSA logs), some parties use commas (to make CSV files), some parties use tabs (to make tab-delimited files). Each of these has their own benefits and drawbacks.
今天(2022 年)我们喜欢结构化日志记录和日志索引器(例如 Elasticsearch 或 Loki)。所以我们必须登录 NDJson (新行分隔 Json)。
因为日志传送代理在日志文件轮换方面遇到困难(谁喜欢丢失日志事件??),我们避免轮换(不再有
logrotate
!!):相反,我们使用日期模式或自动递增序列命名文件,并且定义删除过期文件的策略。不要使用单个文件!使用集中式日志搜索引擎!
https://en.wikipedia.org/wiki/Common_Log_Format 已成为过去。
Today (2022) we love structured logging and log indexers (like Elasticsearch or Loki). So we have to log in NDJson (new line delimetted Json).
Because log delivery agents have difficulty with log file rotation (who loves missing log events??) we avoid rotation (no more
logrotate
!!): instead we name files using date pattern or auto-incremented sequence and define policy to remove outdated files.Don't work with individual files! Use centralized log search engines!!
https://en.wikipedia.org/wiki/Common_Log_Format is in the past.
建议:
对于蜜罐系统(除非坏人真的在攻击应用程序/站点),您可以考虑花额外的时间登录数据库。
这将使日志的分析和使用变得更容易,并且是实时的(即,在分析/浏览日志之前,您不需要经历 ETL 过程。
这就是说在数据库表或文件中( s),这并不排除定义格式的需要。暂时,您可以使用“多态”格式,其中包含一些常见属性(ID、IP 地址)。 、时间戳、Cookie/ID、“级别”[重要/紧急]),后跟定义特定事件类型的简短助记符代码(例如“LIA” = 登录尝试、“GURL” = 猜测的 url、“SQLI” SQL 注入尝试)等等...)后面跟着一些数字字段和一些字符串字段,它们的语义会根据助记符而变化
。去数据库),你可以/应该使用标准日志库,也许log4j如其他回复中所建议的那样(尽管我不确定它是否容易在Python中绑定) ,无论如何,Python 的标准日志记录模块是 +/- 相同的...)甚至是 Python 标准库的日志记录模块 可能可以根据您的需求进行定制。
A suggestion:
It being for a honeypot system (and unless the baddies are really whacking the application/site), you may consider taking the extra time to log to a database instead.
This will make the analysis and usage of the logs easier, and real-time (i.e. you do not need to go through the ETL process prior to analyzing / browsing the logs.
This said being in a DB table(s) or in file(s), this doesn't preclude the need to define a format. Tentatively, you can have a "polymorphic" format, with a few common attributes (ID, IP address, Timestamp, Cookie/ID, "level" [of importance/urgency]) followed by a short mnemonic code defining a particular event type (say "LIA" = login attempt, "GURL" = guessed url, "SQLI" SQL Injection attempt etc...) followed by a few numeric fields, and a few string fields which semantics will vary as per the mnemonic. To summarize:
Now... regardless of this going to a flat file or to SQL database (and maybe particularly if going to DB), you could/should use a standard logging library. Maybe log4j as suggested in other replies (although I'm not sure if it readily has bindings in Python, and anyway, the Python's standard logging module is +/- the same...) or even the Python's standard library's logging module can probably be tailored for your needs.
在我看来,最重要的是:
In my opinion, the most important is:
日志文件
Log File