使用 awk 按小时(行)计算列的平均值
我在文件中有以下行,我想按小时获取第三列的平均值。
2010-10-28 12:02:36: 5.1721851 secs
2010-10-28 12:03:43: 4.4692638 secs
2010-10-28 12:04:51: 3.3770310 secs
2010-10-28 12:05:58: 4.6227063 secs
2010-10-28 12:07:08: 5.1650404 secs
2010-10-28 12:08:16: 3.2819025 secs
2010-10-28 13:01:36: 2.1721851 secs
2010-10-28 13:02:43: 3.4692638 secs
2010-10-28 13:03:51: 4.3770310 secs
2010-10-28 13:04:58: 3.6227063 secs
2010-10-28 13:05:08: 3.1650404 secs
2010-10-28 13:06:16: 4.2819025 secs
2010-10-28 14:12:36: 7.1721851 secs
2010-10-28 14:23:43: 7.4692638 secs
2010-10-28 14:24:51: 7.3770310 secs
2010-10-28 14:25:58: 9.6227063 secs
2010-10-28 14:37:08: 7.1650404 secs
2010-10-28 14:48:16: 7.2819025 secs
我已经完成
cat filename | awk '{sum+=$3} END {print "Average = ",sum/NR}'
输出
Average = 4.49154
以获取整个文件的平均值,但想按小时细分平均值。在将输出通过管道传输到 awk 之前,我可以偷偷地使用 grep 一个小时,但我希望用一个衬垫来完成它。
理想情况下,输出将类似于
Average 12:00 = _computed_avg_
Average 13:00 = _computed_avg_
Average 14:00 = _computed_avg_
等等。
不一定要寻找答案,但希望能指明正确的方向。
I have the following rows in a file that I want to get the average of the 3rd column by hour.
2010-10-28 12:02:36: 5.1721851 secs
2010-10-28 12:03:43: 4.4692638 secs
2010-10-28 12:04:51: 3.3770310 secs
2010-10-28 12:05:58: 4.6227063 secs
2010-10-28 12:07:08: 5.1650404 secs
2010-10-28 12:08:16: 3.2819025 secs
2010-10-28 13:01:36: 2.1721851 secs
2010-10-28 13:02:43: 3.4692638 secs
2010-10-28 13:03:51: 4.3770310 secs
2010-10-28 13:04:58: 3.6227063 secs
2010-10-28 13:05:08: 3.1650404 secs
2010-10-28 13:06:16: 4.2819025 secs
2010-10-28 14:12:36: 7.1721851 secs
2010-10-28 14:23:43: 7.4692638 secs
2010-10-28 14:24:51: 7.3770310 secs
2010-10-28 14:25:58: 9.6227063 secs
2010-10-28 14:37:08: 7.1650404 secs
2010-10-28 14:48:16: 7.2819025 secs
I have done
cat filename | awk '{sum+=$3} END {print "Average = ",sum/NR}'
with the output
Average = 4.49154
to get the average for the entire file, but want to break the average down by hour. I can sneak a grep for the hour before the piping the output to awk, but I'd like to, hopefully, do it with a one liner.
Ideally, the output would be something like
Average 12:00 = _computed_avg_
Average 13:00 = _computed_avg_
Average 14:00 = _computed_avg_
and so on.
Not necessarily looking for an answer, but hoping to be pointed in the right direction.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我将字段分隔符设置为冒号,然后将数组中的不同键聚合到一个关联数组中,最后计算平均值:
在您的测试数据上,这给出:
即使数据不在其中,这也应该产生正确的答案时间顺序(假设您不按顺序连接两个日志文件)。请注意,gawk 将对“3.123 秒”值进行数字求和。最后的排序以时间顺序呈现平均值;不保证按键将按时间顺序打印。
I would set the field delimiter to colon, then aggregate in an associative array for the different keys in the array, and finally compute the averages:
On your test data, this gives:
This should produce the correct answer even if the data is not in time order (say you concatenate two log files out of sequence). Note that gawk will sum '3.123 secs' values numerically. The final sort presents the averages in time sequence; there is no guarantee that the keys will be printed in time sequence.
Awk 具有关联数组,因此您可以按小时存储平均值。
Awk has associative arrays, so you can store averages by hour.