如何对日志文件进行计算
我有一个看起来像这样的:
I, [2009-03-04T15:03:25.502546 #17925] INFO -- : [8541, 931, 0, 0]
I, [2009-03-04T15:03:26.094855 #17925] INFO -- : [8545, 6678, 0, 0]
I, [2009-03-04T15:03:26.353079 #17925] INFO -- : [5448, 1598, 185, 0]
I, [2009-03-04T15:03:26.360148 #17925] INFO -- : [8555, 1747, 0, 0]
I, [2009-03-04T15:03:26.367523 #17925] INFO -- : [7630, 278, 0, 0]
I, [2009-03-04T15:03:26.375845 #17925] INFO -- : [7640, 286, 0, 0]
I, [2009-03-04T15:03:26.562425 #17925] INFO -- : [5721, 896, 0, 0]
I, [2009-03-04T15:03:30.951336 #17925] INFO -- : [8551, 4752, 1587, 1]
I, [2009-03-04T15:03:30.960007 #17925] INFO -- : [5709, 5295, 0, 0]
I, [2009-03-04T15:03:30.966612 #17925] INFO -- : [7252, 4928, 0, 0]
I, [2009-03-04T15:03:30.974251 #17925] INFO -- : [8561, 4883, 1, 0]
I, [2009-03-04T15:03:31.230426 #17925] INFO -- : [8563, 3866, 250, 0]
I, [2009-03-04T15:03:31.236830 #17925] INFO -- : [8567, 4122, 0, 0]
I, [2009-03-04T15:03:32.056901 #17925] INFO -- : [5696, 5902, 526, 1]
I, [2009-03-04T15:03:32.086004 #17925] INFO -- : [5805, 793, 0, 0]
I, [2009-03-04T15:03:32.110039 #17925] INFO -- : [5786, 818, 0, 0]
I, [2009-03-04T15:03:32.131433 #17925] INFO -- : [5777, 840, 0, 0]
我想创建一个 shell 脚本来计算括号中的第二个和第三个字段的平均值(最后一个中的 840
和 0
例子)。 一个更棘手的问题:是否只有当最后一个字段不为0
时才能获得第三字段的平均值?
我知道我可以使用 Ruby 或其他语言来创建脚本,但我想在 Bash 中执行此操作。 有关资源的任何好的建议或如何创建此类脚本的提示都会有所帮助。
I have a that looks like this:
I, [2009-03-04T15:03:25.502546 #17925] INFO -- : [8541, 931, 0, 0]
I, [2009-03-04T15:03:26.094855 #17925] INFO -- : [8545, 6678, 0, 0]
I, [2009-03-04T15:03:26.353079 #17925] INFO -- : [5448, 1598, 185, 0]
I, [2009-03-04T15:03:26.360148 #17925] INFO -- : [8555, 1747, 0, 0]
I, [2009-03-04T15:03:26.367523 #17925] INFO -- : [7630, 278, 0, 0]
I, [2009-03-04T15:03:26.375845 #17925] INFO -- : [7640, 286, 0, 0]
I, [2009-03-04T15:03:26.562425 #17925] INFO -- : [5721, 896, 0, 0]
I, [2009-03-04T15:03:30.951336 #17925] INFO -- : [8551, 4752, 1587, 1]
I, [2009-03-04T15:03:30.960007 #17925] INFO -- : [5709, 5295, 0, 0]
I, [2009-03-04T15:03:30.966612 #17925] INFO -- : [7252, 4928, 0, 0]
I, [2009-03-04T15:03:30.974251 #17925] INFO -- : [8561, 4883, 1, 0]
I, [2009-03-04T15:03:31.230426 #17925] INFO -- : [8563, 3866, 250, 0]
I, [2009-03-04T15:03:31.236830 #17925] INFO -- : [8567, 4122, 0, 0]
I, [2009-03-04T15:03:32.056901 #17925] INFO -- : [5696, 5902, 526, 1]
I, [2009-03-04T15:03:32.086004 #17925] INFO -- : [5805, 793, 0, 0]
I, [2009-03-04T15:03:32.110039 #17925] INFO -- : [5786, 818, 0, 0]
I, [2009-03-04T15:03:32.131433 #17925] INFO -- : [5777, 840, 0, 0]
I'd like to create a shell script that calculates the average of the 2nd and 3rd fields in brackets (840
and 0
in the last example). An even tougher question: is it possible to get the average of the 3rd field only when the last one is not 0
?
I know I could use Ruby
or another language to create a script, but I'd like to do it in Bash
. Any good suggestions on resources or hints in how to create such a script would help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用 bash 和 awk:
示例输出(针对您的原始数据) ):
当然,您不需要使用
cat
,将其包含在那里是为了便于阅读,并说明输入数据可以来自任何管道; 如果您必须对现有文件进行操作,请运行 sed -ne '...' file | ...直接。编辑
如果您可以访问
gawk
(GNU awk),则可以消除对sed
的需要,如下所示:相同备注。
cat
适用。一些解释:
sed
仅打印与正则表达式匹配的行(-n ... :p
组合)(包含 INFO 后跟任何数字组合的行,行尾方括号之间的空格和逗号,允许尾随空格和 CR); 如果任何这样的行匹配,则在打印之前仅保留方括号之间的内容(\1
,对应于正则表达式中的\(...\)
之间的内容)( <代码>:p)8541, 931, 0, 0
awk
使用由 0 个或多个空格包围的逗号 (-F ' *, *'
) 作为字段分隔符;$1
对应于第一列(例如 8541),$2
对应于第二列,依此类推。缺少的列计为值0
awk
将累加器sum2
等除以处理的记录数,NR
gawk
一举完成所有事情; 它将首先测试每一行是否与上一个示例中传递给sed
的正则表达式匹配(除了与sed
不同,awk
不需要圆括号中的\
界定区域或兴趣)。 如果该行匹配,则圆括号之间的内容将以 a[1] 结束,然后我们使用相同的分隔符(由任意数量的空格包围的逗号)将其拆分,并使用它进行累加。 我引入了cnt
而不是继续使用NR
,因为NR
处理的记录数可能会大于实际的相关记录数(NR
>cnt) 如果不是所有行都采用INFO ... [...comma-separated-numbers...]
形式,而sed 则不是这种情况|awk
因为sed
保证传递给awk
的所有行都是相关的。Use
bash
andawk
:Sample output (for your original data):
Of course, you do not need to use
cat
, it is included there for legibility and to illustrate the fact that input data can come from any pipe; if you have to operate on an existing file, runsed -ne '...' file | ...
directly.EDIT
If you have access to
gawk
(GNU awk), you can eliminate the need forsed
as follows:Same remarks re.
cat
apply.A bit of explanation:
sed
only prints out lines (-n ... :p
combination) that match the regular expression (lines containing INFO followed by any combination of digits, spaces and commas between square brackets at the end of the line, allowing for trailing spaces and CR); if any such line matches, only keep what's between the square brackets (\1
, corresponding to what's between\(...\)
in the regular expression) before printing (:p
)8541, 931, 0, 0
awk
uses a comma surrounded by 0 or more spaces (-F ' *, *'
) as field delimiters;$1
corresponds to the first column (e.g. 8541),$2
to the second etc. Missing columns count as value0
awk
divides the accumulatorssum2
etc by the number of records processed,NR
gawk
does everything in one shot; it will first test whether each line matches the same regular expression passed in the previous example tosed
(except that unlikesed
,awk
does not require a\
in fron the round brackets delimiting areas or interest). If the line matches, what's between the round brackets ends up in a[1], which we then split using the same separator (a comma surrounded by any number of spaces) and use that to accumulate. I introducedcnt
instead of continuing to useNR
because the number of records processedNR
may be larger than the actual number of relevant records (cnt
) if not all lines are of the formINFO ... [...comma-separated-numbers...]
, which was not the case withsed|awk
sincesed
guaranteed that all lines passed on toawk
were relevant.我也通过 IM 粘贴给你的回复在这里发布,只是因为它让我尝试 StackOverflow :)
Posting the reply I pasted to you over IM here too, just because it makes me try StackOverflow out :)
在 Solaris 上使用 nawk 或 /usr/xpg4/bin/awk。
Use nawk or /usr/xpg4/bin/awk on Solaris.
使用Python
Use Python