使用 awk 按小时（行）计算列的平均值

发布于 2024-09-29 14:34:00 字数 1133 浏览 3 评论 0原文

我在文件中有以下行，我想按小时获取第三列的平均值。

2010-10-28 12:02:36: 5.1721851 secs
2010-10-28 12:03:43: 4.4692638 secs
2010-10-28 12:04:51: 3.3770310 secs
2010-10-28 12:05:58: 4.6227063 secs
2010-10-28 12:07:08: 5.1650404 secs
2010-10-28 12:08:16: 3.2819025 secs

2010-10-28 13:01:36: 2.1721851 secs
2010-10-28 13:02:43: 3.4692638 secs
2010-10-28 13:03:51: 4.3770310 secs
2010-10-28 13:04:58: 3.6227063 secs
2010-10-28 13:05:08: 3.1650404 secs
2010-10-28 13:06:16: 4.2819025 secs

2010-10-28 14:12:36: 7.1721851 secs
2010-10-28 14:23:43: 7.4692638 secs
2010-10-28 14:24:51: 7.3770310 secs
2010-10-28 14:25:58: 9.6227063 secs
2010-10-28 14:37:08: 7.1650404 secs
2010-10-28 14:48:16: 7.2819025 secs

我已经完成

cat filename | awk '{sum+=$3} END {print "Average = ",sum/NR}'

输出

Average =  4.49154

以获取整个文件的平均值，但想按小时细分平均值。在将输出通过管道传输到 awk 之前，我可以偷偷地使用 grep 一个小时，但我希望用一个衬垫来完成它。

理想情况下，输出将类似于

Average 12:00 = _computed_avg_
Average 13:00 = _computed_avg_
Average 14:00 = _computed_avg_

等等。

不一定要寻找答案，但希望能指明正确的方向。

原文

I have the following rows in a file that I want to get the average of the 3rd column by hour.

2010-10-28 12:02:36: 5.1721851 secs
2010-10-28 12:03:43: 4.4692638 secs
2010-10-28 12:04:51: 3.3770310 secs
2010-10-28 12:05:58: 4.6227063 secs
2010-10-28 12:07:08: 5.1650404 secs
2010-10-28 12:08:16: 3.2819025 secs

2010-10-28 13:01:36: 2.1721851 secs
2010-10-28 13:02:43: 3.4692638 secs
2010-10-28 13:03:51: 4.3770310 secs
2010-10-28 13:04:58: 3.6227063 secs
2010-10-28 13:05:08: 3.1650404 secs
2010-10-28 13:06:16: 4.2819025 secs

2010-10-28 14:12:36: 7.1721851 secs
2010-10-28 14:23:43: 7.4692638 secs
2010-10-28 14:24:51: 7.3770310 secs
2010-10-28 14:25:58: 9.6227063 secs
2010-10-28 14:37:08: 7.1650404 secs
2010-10-28 14:48:16: 7.2819025 secs

I have done

cat filename | awk '{sum+=$3} END {print "Average = ",sum/NR}'

with the output

Average =  4.49154

to get the average for the entire file, but want to break the average down by hour. I can sneak a grep for the hour before the piping the output to awk, but I'd like to, hopefully, do it with a one liner.

Ideally, the output would be something like

Average 12:00 = _computed_avg_
Average 13:00 = _computed_avg_
Average 14:00 = _computed_avg_

and so on.

Not necessarily looking for an answer, but hoping to be pointed in the right direction.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

陈年往事 2024-10-06 14:34:00

我将字段分隔符设置为冒号，然后将数组中的不同键聚合到一个关联数组中，最后计算平均值：

gawk -F: 'NF == 4 { sum[$1] += $4; N[$1]++ } 
          END     { for (key in sum) {
                        avg = sum[key] / N[key];
                        printf "%s %f\n", key, avg;
                    } }' filename | sort

在您的测试数据上，这给出：

2010-10-28 12 4.348022
2010-10-28 13 3.514688
2010-10-28 14 7.681355

即使数据不在其中，这也应该产生正确的答案时间顺序（假设您不按顺序连接两个日志文件）。请注意，gawk 将对“3.123 秒”值进行数字求和。最后的排序以时间顺序呈现平均值；不保证按键将按时间顺序打印。

I would set the field delimiter to colon, then aggregate in an associative array for the different keys in the array, and finally compute the averages:

gawk -F: 'NF == 4 { sum[$1] += $4; N[$1]++ } 
          END     { for (key in sum) {
                        avg = sum[key] / N[key];
                        printf "%s %f\n", key, avg;
                    } }' filename | sort

On your test data, this gives:

2010-10-28 12 4.348022
2010-10-28 13 3.514688
2010-10-28 14 7.681355

This should produce the correct answer even if the data is not in time order (say you concatenate two log files out of sequence). Note that gawk will sum '3.123 secs' values numerically. The final sort presents the averages in time sequence; there is no guarantee that the keys will be printed in time sequence.

回复收藏 0 原文