Unix 中的日志解析器/分析器
人们在 Unix 中使用什么流行工具来解析/分析日志文件?进行计数,找到唯一的,选择/复制具有特定模式的特定行。请推荐一些工具或一些关键字。因为我相信以前肯定有人问过类似的问题,但我不知道关键字。谢谢。
What's the popular tool people use in Unix to parse/analyze log files? Doing counting, find unique, select/copy certain line which have certain patterns. Please advise some tools or some keyword. Since I believe there must be similar questions asked before, but I don't any idea about the keywords. Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我发现许多日志格式没有使用适当的唯一字段分隔符来分隔列,这是一个巨大的失败。不是因为这是最好的,而是因为它是 unix textutils 操作表数据的基本前提。相反,他们倾向于使用空格作为分隔符和可能包含空格的引用字段。
我对 Web 日志分析所做的最实用的简单更改之一是保留 nginx Web 服务器生成的默认 NCSA 日志格式,而是使用制表符作为字段分隔符。
突然之间,我可以使用所有原始的 unix textutils 进行快速查找,尤其是 awk!仅打印用户代理字段包含 Googlebot 的行:
查找每个唯一请求的请求数
,当然还有大量组合来查找特定访问者。
I find it to be a huge failure that many log formats do not separate columns with proper unique field separators. Not because that is best, but because it is the basic premise of unix textutils that operate on table data. Instead they tend to use spaces as separators and quote fields that might contain spaces.
One of the most practical simple changes I made to web log analyzing was to leave the default NCSA log format produced by the nginx web server, to instead use tab as the field separator.
Suddenly I could use all of the primitive unix textutils for quick lookups, but especially awk! Print only lines where the user-agent field contains Googlebot:
Find the number of requests on for each unique request
And of course lots of combinations to find specific visitors.
对于定期的夜间检查,有 logwatch,它在
/usr/share/logwatch 中有几个不同的脚本/scripts/services
检查 syslog 中的特定内容(例如 Web 服务器内容、ftp 服务器内容、sshd 相关内容等)。默认安装启用了其中的大多数功能,但您可以根据需要启用/禁用,甚至可以编写自己的脚本。对于实时观看,有 multitail。
For regular, nightly checking there is logwatch which have several different scripts in
/usr/share/logwatch/scripts/services
that check for specific things (like web server stuff, ftp server stuff, sshd related stuff, etc) in syslog. Default install enables most of them, but you are able to enable/disable as you like or even write your own scripts.For real-time watching there is multitail.
您可能想尝试lnav,一个基于curses的日志分析器。它具有您期望从日志解析器获得的大部分功能,例如,来自多个日志文件的日志消息按时间顺序排列、支持多种日志格式、突出显示错误/警告消息、用于在错误/警告消息之间导航的热键、对 SQL 的支持查询以及更多。查看该项目的网站以获取屏幕截图和详细的功能列表。
You might want to try out lnav, a curses based log analyzer. It has most of the features you would expect from a log parser like, chronological arrangement of log messages from multiple log files, support for multiple log formats, highlighting of error/warning messages, hotkeys for navigating between error/warning messages, support for SQL queries and lots more. Take a look at the project's website for screenshots and a detailed list of features.
查看此处列出的一些通用日志解析器。如果您使用诸如 syslog 之类的东西,您也可能可以获得自定义解析器/分析器。否则,对于简单的搜索,任何脚本语言如
perl
、python
甚至awk
就足够了。Take a look at some of the generic log parsers listed here. If you use something like
syslog
, you can probably get a custom parser/analyzer too. Otherwise, for trivial searches, any scripting language likeperl
,python
or evenawk
suffices.任何允许您打开和读取文件、进行字符串/文本操作的编程语言都可以使用,例如 Perl、Python、(g)awk、Ruby、PHP,甚至 Java 等。它们支持您正在解析的文件格式的模块,例如csv等
Any programming language that allows you to open and read files, do string/text manipulations can be used, eg Perl,Python,(g)awk, Ruby,PHP, even Java etc. They support modules for the file formats you are parsing,eg csv, etc.