如何对日志文件进行计算

发布于 2024-07-14 11:24:32 字数 1448 浏览 13 评论 0原文

我有一个看起来像这样的：

I, [2009-03-04T15:03:25.502546 #17925]  INFO -- : [8541, 931, 0, 0]
I, [2009-03-04T15:03:26.094855 #17925]  INFO -- : [8545, 6678, 0, 0]
I, [2009-03-04T15:03:26.353079 #17925]  INFO -- : [5448, 1598, 185, 0]
I, [2009-03-04T15:03:26.360148 #17925]  INFO -- : [8555, 1747, 0, 0]
I, [2009-03-04T15:03:26.367523 #17925]  INFO -- : [7630, 278, 0, 0]
I, [2009-03-04T15:03:26.375845 #17925]  INFO -- : [7640, 286, 0, 0]
I, [2009-03-04T15:03:26.562425 #17925]  INFO -- : [5721, 896, 0, 0]
I, [2009-03-04T15:03:30.951336 #17925]  INFO -- : [8551, 4752, 1587, 1]
I, [2009-03-04T15:03:30.960007 #17925]  INFO -- : [5709, 5295, 0, 0]
I, [2009-03-04T15:03:30.966612 #17925]  INFO -- : [7252, 4928, 0, 0]
I, [2009-03-04T15:03:30.974251 #17925]  INFO -- : [8561, 4883, 1, 0]
I, [2009-03-04T15:03:31.230426 #17925]  INFO -- : [8563, 3866, 250, 0]
I, [2009-03-04T15:03:31.236830 #17925]  INFO -- : [8567, 4122, 0, 0]
I, [2009-03-04T15:03:32.056901 #17925]  INFO -- : [5696, 5902, 526, 1]
I, [2009-03-04T15:03:32.086004 #17925]  INFO -- : [5805, 793, 0, 0]
I, [2009-03-04T15:03:32.110039 #17925]  INFO -- : [5786, 818, 0, 0]
I, [2009-03-04T15:03:32.131433 #17925]  INFO -- : [5777, 840, 0, 0]

我想创建一个 shell 脚本来计算括号中的第二个和第三个字段的平均值（最后一个中的 840 和 0例子）。一个更棘手的问题：是否只有当最后一个字段不为0时才能获得第三字段的平均值？

我知道我可以使用 Ruby 或其他语言来创建脚本，但我想在 Bash 中执行此操作。有关资源的任何好的建议或如何创建此类脚本的提示都会有所帮助。

原文

I have a that looks like this:

I, [2009-03-04T15:03:25.502546 #17925]  INFO -- : [8541, 931, 0, 0]
I, [2009-03-04T15:03:26.094855 #17925]  INFO -- : [8545, 6678, 0, 0]
I, [2009-03-04T15:03:26.353079 #17925]  INFO -- : [5448, 1598, 185, 0]
I, [2009-03-04T15:03:26.360148 #17925]  INFO -- : [8555, 1747, 0, 0]
I, [2009-03-04T15:03:26.367523 #17925]  INFO -- : [7630, 278, 0, 0]
I, [2009-03-04T15:03:26.375845 #17925]  INFO -- : [7640, 286, 0, 0]
I, [2009-03-04T15:03:26.562425 #17925]  INFO -- : [5721, 896, 0, 0]
I, [2009-03-04T15:03:30.951336 #17925]  INFO -- : [8551, 4752, 1587, 1]
I, [2009-03-04T15:03:30.960007 #17925]  INFO -- : [5709, 5295, 0, 0]
I, [2009-03-04T15:03:30.966612 #17925]  INFO -- : [7252, 4928, 0, 0]
I, [2009-03-04T15:03:30.974251 #17925]  INFO -- : [8561, 4883, 1, 0]
I, [2009-03-04T15:03:31.230426 #17925]  INFO -- : [8563, 3866, 250, 0]
I, [2009-03-04T15:03:31.236830 #17925]  INFO -- : [8567, 4122, 0, 0]
I, [2009-03-04T15:03:32.056901 #17925]  INFO -- : [5696, 5902, 526, 1]
I, [2009-03-04T15:03:32.086004 #17925]  INFO -- : [5805, 793, 0, 0]
I, [2009-03-04T15:03:32.110039 #17925]  INFO -- : [5786, 818, 0, 0]
I, [2009-03-04T15:03:32.131433 #17925]  INFO -- : [5777, 840, 0, 0]

I'd like to create a shell script that calculates the average of the 2nd and 3rd fields in brackets (840 and 0 in the last example). An even tougher question: is it possible to get the average of the 3rd field only when the last one is not 0?

I know I could use Ruby or another language to create a script, but I'd like to do it in Bash. Any good suggestions on resources or hints in how to create such a script would help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我三岁 2024-07-21 11:24:32

使用 bash 和 awk：

cat 文件 | sed -ne 's:^.*INFO.*\[$[0-9, ]*$\][ \r]*$:\1:p' | awk -F ' *, *' '{ sum2 += $2 ; sum3 += $3 } END { if (NR>0) printf "avg2=%.2f, avg3=%.2f\n", sum2/NR, sum3/NR }'

示例输出（针对您的原始数据））：

avg2=2859.59，avg3=149.94

当然，您不需要使用 cat，将其包含在那里是为了便于阅读，并说明输入数据可以来自任何管道; 如果您必须对现有文件进行操作，请运行 sed -ne '...' file | ...直接。

编辑

如果您可以访问gawk (GNU awk)，则可以消除对sed的需要，如下所示：

cat 文件 | gawk '{ if(match($0, /.*INFO.*\[([0-9, ]*)\][ \r]*$/, a)) { cnt++; 分割(a[1], b, / *, */); sum2+=b[2]; sum3+=b[3] } } END { if (cnt>0) printf "avg2=%.2f, avg3=%.2f\n", sum2/cnt, sum3/cnt }'

相同备注。 cat 适用。

一些解释：

sed 仅打印与正则表达式匹配的行（-n ... :p 组合）（包含 INFO 后跟任何数字组合的行，行尾方括号之间的空格和逗号，允许尾随空格和 CR）；如果任何这样的行匹配，则在打印之前仅保留方括号之间的内容（\1，对应于正则表达式中的$...$之间的内容）（ <代码>：p）
- sed 将输出如下所示的行：8541, 931, 0, 0
awk 使用由 0 个或多个空格包围的逗号 (-F ' *, *') 作为字段分隔符； $1 对应于第一列（例如 8541），$2 对应于第二列，依此类推。缺少的列计为值 0
- 最后，awk 将累加器 sum2 等除以处理的记录数，NR
gawk 一举完成所有事情；它将首先测试每一行是否与上一个示例中传递给 sed 的正则表达式匹配（除了与 sed 不同，awk 不需要圆括号中的 \ 界定区域或兴趣）。如果该行匹配，则圆括号之间的内容将以 a[1] 结束，然后我们使用相同的分隔符（由任意数量的空格包围的逗号）将其拆分，并使用它进行累加。我引入了cnt而不是继续使用NR，因为NR处理的记录数可能会大于实际的相关记录数（NR >cnt) 如果不是所有行都采用 INFO ... [...comma-separated-numbers...] 形式，而 sed 则不是这种情况|awk 因为 sed 保证传递给 awk 的所有行都是相关的。

Use bash and awk:

cat file | sed -ne 's:^.*INFO.*\[$[0-9, ]*$\][ \r]*$:\1:p' | awk -F ' *, *' '{ sum2 += $2 ; sum3 += $3 } END { if (NR>0) printf "avg2=%.2f, avg3=%.2f\n", sum2/NR, sum3/NR }'

Sample output (for your original data):

avg2=2859.59, avg3=149.94

Of course, you do not need to use cat, it is included there for legibility and to illustrate the fact that input data can come from any pipe; if you have to operate on an existing file, run sed -ne '...' file | ... directly.

EDIT

If you have access to gawk (GNU awk), you can eliminate the need for sed as follows:

cat file | gawk '{ if(match($0, /.*INFO.*\[([0-9, ]*)\][ \r]*$/, a)) { cnt++; split(a[1], b, / *, */); sum2+=b[2]; sum3+=b[3] } } END { if (cnt>0) printf "avg2=%.2f, avg3=%.2f\n", sum2/cnt, sum3/cnt }'

Same remarks re. cat apply.

A bit of explanation:

sed only prints out lines (-n ... :p combination) that match the regular expression (lines containing INFO followed by any combination of digits, spaces and commas between square brackets at the end of the line, allowing for trailing spaces and CR); if any such line matches, only keep what's between the square brackets (\1, corresponding to what's between $...$ in the regular expression) before printing (:p)
- sed will output lines that look like: 8541, 931, 0, 0
awk uses a comma surrounded by 0 or more spaces (-F ' *, *') as field delimiters; $1 corresponds to the first column (e.g. 8541), $2 to the second etc. Missing columns count as value 0
- at the end, awk divides the accumulators sum2 etc by the number of records processed, NR
gawk does everything in one shot; it will first test whether each line matches the same regular expression passed in the previous example to sed (except that unlike sed, awk does not require a \ in fron the round brackets delimiting areas or interest). If the line matches, what's between the round brackets ends up in a[1], which we then split using the same separator (a comma surrounded by any number of spaces) and use that to accumulate. I introduced cnt instead of continuing to use NR because the number of records processed NR may be larger than the actual number of relevant records (cnt) if not all lines are of the form INFO ... [...comma-separated-numbers...], which was not the case with sed|awk since sed guaranteed that all lines passed on to awk were relevant.

回复收藏 0 原文

王权女流氓 2024-07-21 11:24:32

我也通过 IM 粘贴给你的回复在这里发布，只是因为它让我尝试 StackOverflow :)

# replace $2 with the column you want to avg; 
awk '{ print $2 }' | perl -ne 'END{ printf "%.2f\n", $total/$n }; chomp; $total+= $_; $n++' < log

Posting the reply I pasted to you over IM here too, just because it makes me try StackOverflow out :)

# replace $2 with the column you want to avg; 
awk '{ print $2 }' | perl -ne 'END{ printf "%.2f\n", $total/$n }; chomp; $total+= $_; $n++' < log

回复收藏 0 原文

逆蝶 2024-07-21 11:24:32

在 Solaris 上使用 nawk 或 /usr/xpg4/bin/awk。

awk -F'[],]' 'END { 
  print s/NR, t/ct 
  }  
{ 
  s += $(NF-3) 
  if ($(NF-1)) {
    t += $(NF-2)
    ct++
    }
  }' infile

Use nawk or /usr/xpg4/bin/awk on Solaris.

awk -F'[],]' 'END { 
  print s/NR, t/ct 
  }  
{ 
  s += $(NF-3) 
  if ($(NF-1)) {
    t += $(NF-2)
    ct++
    }
  }' infile

回复收藏 0 原文

听风念你 2024-07-21 11:24:32

使用Python

logfile= open( "somelogfile.log", "r" )
sum2, count2= 0, 0
sum3, count3= 0, 0
for line in logfile:
    # find right-most brackets
    _, bracket, fieldtext = line.rpartition('[')
    datatext, bracket, _ = fieldtext.partition(']')
    # split fields and convert to integers
    data = map( int, datatext.split(',') )
    # compute sums and counts
    sum2 += data[1]
    count2 += 1
    if data[3] != 0:
        sum3 += data[2]
        count3 += 1
logfile.close()

print sum2, count2, float(sum2)/count2
print sum3, count3, float(sum3)/count3

Use Python

logfile= open( "somelogfile.log", "r" )
sum2, count2= 0, 0
sum3, count3= 0, 0
for line in logfile:
    # find right-most brackets
    _, bracket, fieldtext = line.rpartition('[')
    datatext, bracket, _ = fieldtext.partition(']')
    # split fields and convert to integers
    data = map( int, datatext.split(',') )
    # compute sums and counts
    sum2 += data[1]
    count2 += 1
    if data[3] != 0:
        sum3 += data[2]
        count3 += 1
logfile.close()

print sum2, count2, float(sum2)/count2
print sum3, count3, float(sum3)/count3

回复收藏 0 原文

~没有更多了~