使用 unix 命令比较日志文件中的时间戳

发布于 2024-12-05 18:25:44 字数 500 浏览 3 评论 0原文

我有一个包含如下行的日志文件:

...timestamp...(id=1234)..GO...
...timestamp...(id=1234)..DONE...

事实:

  • 时间戳的格式为 HH:MM:SS.ssss (s 表示部分秒)
  • 每个“id”编号有两个关联行,一个“GO”和一个“DONE”
  • 两个关联线不一定彼此相邻;该文件是按时间顺序排列的

我想要的:

  • 匹配关联的 GO/DONE 行
  • 比较时间戳
  • (理想情况下)创建一个以下形式的新文件:

    diffTime ; <完成线>
    

我的主要症结是比较时间戳。这将非常有用,而且我缺乏编写它的 sort/sed/awk 技能。是否有日志文件工具可以帮助解决此类黑客攻击?

I have a log file with lines like this:

...timestamp...(id=1234)..GO...
...timestamp...(id=1234)..DONE...

Facts:

  • timestamps are of the form HH:MM:SS.ssss (s for partial seconds)
  • each 'id' number has two associated lines, a "GO" and a "DONE"
  • two associated lines are not necessarily next to each other; the file is chronological

What I want:

  • match up associated GO/DONE lines
  • diff the timestamps
  • (ideally) create a new file of the form:

    diffTime <GO line> <DONE line>
    

My main sticking point is diffing the timestamps. This would be really useful and I lack the sort/sed/awk skills to write it. Are there log file tools to help with this kind of hacking?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

回眸一笑 2024-12-12 18:25:44

我不知道任何这样的工具,但可以用 shell 编写它。例如,此日志:

11:18:51 (id=123) GO
11:18:52 (id=124) GO
11:18:53 (id=123) DONE
11:18:54 (id=125) GO
11:18:55 (id=125) DONE
11:18:55 (id=124) DONE

可以转换为

2 123
3 124
1 125

其中第一列是以秒为单位的时间,第二列是事务 ID。

命令是:

cat example.log
| sed 's|\([^ ]\+\) (id=\([^)]\+\)) \(.\+\)|\1 \2 \3|;s|GO|1|;s|DONE|2|'
| sort -k2,3
| paste - -
| tr ':' ' '
| awk '{printf("%d %d\n", ((($6-$1)*60*60)+(($7-$2)*60)+($8-$3)), $4)}'

这句话可能可以进一步简化。

工作原理:

  • 将行格式更改为“11:18:51 123 GO”,
  • 将 GO 替换为 1,将 DONE 替换为 2(因为稍后它允许我们对其进行正确排序)
  • 按事务 id 和状态对结果行进行排序
  • 连接每 2 行(现在每个结果行描述事务开始和结束)
  • 将所有冒号替换为空格(以简化稍后的awk表达式)
  • 通过手动划分
  • 打印结果来计算时间差异

I don't know any such tools but it is possible to write it in shell. For example, this log:

11:18:51 (id=123) GO
11:18:52 (id=124) GO
11:18:53 (id=123) DONE
11:18:54 (id=125) GO
11:18:55 (id=125) DONE
11:18:55 (id=124) DONE

Can be transformed to

2 123
3 124
1 125

Where first column is time in seconds and second column is a transaction id.

Command was:

cat example.log
| sed 's|\([^ ]\+\) (id=\([^)]\+\)) \(.\+\)|\1 \2 \3|;s|GO|1|;s|DONE|2|'
| sort -k2,3
| paste - -
| tr ':' ' '
| awk '{printf("%d %d\n", ((($6-$1)*60*60)+(($7-$2)*60)+($8-$3)), $4)}'

This one-liner probably may be simplified even more.

How it works:

  • change line format to "11:18:51 123 GO"
  • replaces GO with 1 and DONE with 2 (because in later it allow to us sort it properly)
  • sorts result lines by transaction id and status
  • join each 2 lines (now each result line describes transaction starts and ends)
  • replace all colons to spaces (to simplify awk expessions later)
  • calculate time diff by manually dividing
  • prints result
小ぇ时光︴ 2024-12-12 18:25:44

下面是一个脚本,可以帮助您完成一半:

#!/bin/bash

# Script must be called with one parameter, the name of the file to process
if [ $# -ne 1 ]; then
  echo "Usage: $0 filename"
  exit
fi

filename=$1


# Use sed to put the timestamp after the id
#    10:46:01:0000 (id=20) GO
#    10:46:02:0000 (id=10) GO
#    10:46:03:0000 (id=10) DONE
#    10:46:04:0000 (id=20) DONE
#
#  becomes
#
#    (id=20) 10:46:01:0000 GO
#    (id=10) 10:46:02:0000 GO
#    (id=10) 10:46:03:0000 DONE
#    (id=20) 10:46:04:0000 DONE
#
# \1 timestamp
# \2 id
# \3 status (GO or DONE)
#         \1          \2              \3
sed -e "s/\([0-9:]*\) \((id=[0-9]*)\) \(.*\)/\2 \1 \3/" $filename > temp1


# Now sort the file. This will cause timestamps to be sorted, grouped by id
#    (id=20) 10:46:01:0000 GO
#    (id=10) 10:46:02:0000 GO
#    (id=10) 10:46:03:0000 DONE
#    (id=20) 10:46:04:0000 DONE
#
#  becomes
#
#    (id=10) 10:46:02:0000 GO
#    (id=10) 10:46:03:0000 DONE
#    (id=20) 10:46:01:0000 GO
#    (id=20) 10:46:04:0000 DONE
sort temp1 > temp2


# Use sed to put the id after the timestamp
#    (id=10) 10:46:02:0000 GO
#    (id=10) 10:46:03:0000 DONE
#    (id=20) 10:46:01:0000 GO
#    (id=20) 10:46:04:0000 DONE
#
#  becomes
#
#    10:46:02:0000 (id=10) GO
#    10:46:03:0000 (id=10) DONE
#    10:46:01:0000 (id=20) GO
#    10:46:04:0000 (id=20) DONE
# \1 id
# \2 timestamp
# \3 status (GO or DONE)
sed -e "s/\((id=[0-9]*)\) \([0-9:]*\) \(.*\)/\2 \1 \3/" temp2 > temp3

剩下的...运行此脚本后,每个 GO 行后面都会跟着一个具有相同 id 的 DONE 行,假设存在这样的 DONE 行。

接下来,您可以读取每一对行,提取时间戳并比较它们(查看 Johnsyweb 建议的时间戳函数)。然后将两条线合并为一条线。现在,您的结果将类似于:

#    1s 10:46:02:0000 (id=10) GO 10:46:03:0000 (id=10) DONE
#    3s 10:46:01:0000 (id=20) GO 10:46:04:0000 (id=20) DONE

请注意,按开始时间戳记,条目是如何乱序的。发生这种情况是因为我们之前按 id 排序。我将把它作为练习,让您弄清楚如何以正确的顺序获取条目。我们希望 id=20 的条目出现在 id=10 之前,因为 id=20 是在 id=10 之前启动的。

#    3s 10:46:01:0000 (id=20) GO 10:46:04:0000 (id=20) DONE
#    1s 10:46:02:0000 (id=10) GO 10:46:03:0000 (id=10) DONE

我确信这很令人困惑,所以如果您有疑问,请告诉我。我确信有更有效的方法来完成这一切,但这是我突然想到的。

Here's a script that will get you halfway there:

#!/bin/bash

# Script must be called with one parameter, the name of the file to process
if [ $# -ne 1 ]; then
  echo "Usage: $0 filename"
  exit
fi

filename=$1


# Use sed to put the timestamp after the id
#    10:46:01:0000 (id=20) GO
#    10:46:02:0000 (id=10) GO
#    10:46:03:0000 (id=10) DONE
#    10:46:04:0000 (id=20) DONE
#
#  becomes
#
#    (id=20) 10:46:01:0000 GO
#    (id=10) 10:46:02:0000 GO
#    (id=10) 10:46:03:0000 DONE
#    (id=20) 10:46:04:0000 DONE
#
# \1 timestamp
# \2 id
# \3 status (GO or DONE)
#         \1          \2              \3
sed -e "s/\([0-9:]*\) \((id=[0-9]*)\) \(.*\)/\2 \1 \3/" $filename > temp1


# Now sort the file. This will cause timestamps to be sorted, grouped by id
#    (id=20) 10:46:01:0000 GO
#    (id=10) 10:46:02:0000 GO
#    (id=10) 10:46:03:0000 DONE
#    (id=20) 10:46:04:0000 DONE
#
#  becomes
#
#    (id=10) 10:46:02:0000 GO
#    (id=10) 10:46:03:0000 DONE
#    (id=20) 10:46:01:0000 GO
#    (id=20) 10:46:04:0000 DONE
sort temp1 > temp2


# Use sed to put the id after the timestamp
#    (id=10) 10:46:02:0000 GO
#    (id=10) 10:46:03:0000 DONE
#    (id=20) 10:46:01:0000 GO
#    (id=20) 10:46:04:0000 DONE
#
#  becomes
#
#    10:46:02:0000 (id=10) GO
#    10:46:03:0000 (id=10) DONE
#    10:46:01:0000 (id=20) GO
#    10:46:04:0000 (id=20) DONE
# \1 id
# \2 timestamp
# \3 status (GO or DONE)
sed -e "s/\((id=[0-9]*)\) \([0-9:]*\) \(.*\)/\2 \1 \3/" temp2 > temp3

And for the rest... after running this script, each GO line will be followed by a DONE line with the same id, assuming that such a DONE line exists.

Next you can read each pair of lines, extract the timestamps and diff them (check out the timestamp functions that Johnsyweb suggested). Then consolidate the two lines into one line. Your results will now look something like:

#    1s 10:46:02:0000 (id=10) GO 10:46:03:0000 (id=10) DONE
#    3s 10:46:01:0000 (id=20) GO 10:46:04:0000 (id=20) DONE

Notice how the entries are out of order by the starting timestamp. This happened because we sorted by id earlier. I'll leave it as an exercise for you to figure out how to get the entries in the correct order. We want the entry for id=20 to come before id=10, because id=20 was started before id=10.

#    3s 10:46:01:0000 (id=20) GO 10:46:04:0000 (id=20) DONE
#    1s 10:46:02:0000 (id=10) GO 10:46:03:0000 (id=10) DONE

I'm sure this is confusing, so let me know if you have questions. I'm sure there are more efficient ways to do all this, but this is what I thought of off the top of my head.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文