使用 awk 检查两个日期之间的情况
我有一个包含多个数据结构的文件,如下所示:
eventTimestamp: 2010-03-23T07:56:19.166
result: Allowed
protocol: SMS
payload: RCOMM_SMS
eventTimestamp: 2010-03-23T07:56:19.167
result: Allowed
protocol: SMS
payload: RCOMM_SMS
eventTimestamp: 2010-03-23T07:56:19.186
result: Allowed
protocol: SMS
payload: SMS-MO-FSM
eventTimestamp: 2010-03-23T07:56:19.197
result: Allowed
protocol: SMS
payload: COPS
eventTimestamp: 2010-03-23T07:56:29.519
result: Blocked
protocol: SMS
payload: COPS
type: URL_IWF
result: Blocked
我想查找在 2010-03-23 之间发生的所有有效负载:SMS-MO-FSM 或有效负载:SMS-MO-FSM-INFO 事件12:56:47 和 2010-03-23 13:56:47。到目前为止,在查询此文件时,我按以下方式使用了 awk:
cat checkThis.txt |
awk 'BEGIN{FS="\n"; RS=""; OFS=";"; ORS="\n"}
$1~/eventTimestamp: 2010-03-23T14\:16\:35/ && $4~/SMS-MO-FSM-INFO|SMS-MO-FSM$/ {$1=$1 ""; print $0}'
这将为我提供 2010 年 3 月 23 日 14:16:35 的第二个发生的所有事件。然而,我正在努力思考如何将日期范围放入查询中。我可以使用以下内容将日期放入纪元时间,但是如何在 awk 中使用以下内容来检查日期是否在所需的时间之间:
python -c "import time; ENGINE_TIME_FORMAT='%Y-%m-%dT%H:%M:%S'; print int(time.mktime(time.strptime('2010-03-23T12:52:52', ENGINE_TIME_FORMAT)))"
我知道这可以在 Python 中完成,但我已经用 Python 编写了一个解析器我希望这个方法作为替代检查器,所以如果可能的话我想使用 awk。
我更进一步,创建了一个用于时间转换的 python 脚本:
#!/usr/local/bin/python
import time, sys
ENGINE_TIME_FORMAT='%Y-%m-%dT%H:%M:%S'
testTime = sys.argv[1]
try:
print int(time.mktime(time.strptime(testTime, ENGINE_TIME_FORMAT)))
except:
print "Time to convert %s" % testTime
raise
然后我尝试使用 getline 将转换分配给变量进行比较:
cat checkThis.txt| awk 'BEGIN {FS="\n"; RS=""; OFS=";"; ORS="\n"; "./firstDate '2010-03-23T12:56:47'" | getline start_time; close("firstDate"); "./firstDate '2010-03-23T13:56:47'" | getline end_time; close("firstDate");} ("./firstDate $1" | getline) > start_time {$1=$1 ""; print $0}'
Traceback (most recent call last):
File "./firstDate", line 4, in <module>
testTime = sys.argv[1]
IndexError: list index out of range
getline 在 BEGIN 中工作,我在最终打印中检查了它,但我似乎有脚本比较部分的问题。
I have a file with multiple data structures in it like so:
eventTimestamp: 2010-03-23T07:56:19.166
result: Allowed
protocol: SMS
payload: RCOMM_SMS
eventTimestamp: 2010-03-23T07:56:19.167
result: Allowed
protocol: SMS
payload: RCOMM_SMS
eventTimestamp: 2010-03-23T07:56:19.186
result: Allowed
protocol: SMS
payload: SMS-MO-FSM
eventTimestamp: 2010-03-23T07:56:19.197
result: Allowed
protocol: SMS
payload: COPS
eventTimestamp: 2010-03-23T07:56:29.519
result: Blocked
protocol: SMS
payload: COPS
type: URL_IWF
result: Blocked
I want to find all of the events that are payload: SMS-MO-FSM or payload: SMS-MO-FSM-INFO that occurred between the times 2010-03-23 12:56:47 and 2010-03-23 13:56:47. When querying this file so far I have used awk in the following manner:
cat checkThis.txt |
awk 'BEGIN{FS="\n"; RS=""; OFS=";"; ORS="\n"}
$1~/eventTimestamp: 2010-03-23T14\:16\:35/ && $4~/SMS-MO-FSM-INFO|SMS-MO-FSM$/ {$1=$1 ""; print $0}'
Which will give me all of the events that occurred on the second of 14:16:35 in 2010-03-23. I am struggling, however, to think of how I could put the date range into my query. I could use the following to put the dates into epoch time but how can I use the following in my awk to check whether the date is between the times needed:
python -c "import time; ENGINE_TIME_FORMAT='%Y-%m-%dT%H:%M:%S'; print int(time.mktime(time.strptime('2010-03-23T12:52:52', ENGINE_TIME_FORMAT)))"
I know this could done in Python but I have written a parser in Python for this and I want this method as an alternative checker so I want to use awk if at all possible.
I took this a little further and created a python script for time conversion:
#!/usr/local/bin/python
import time, sys
ENGINE_TIME_FORMAT='%Y-%m-%dT%H:%M:%S'
testTime = sys.argv[1]
try:
print int(time.mktime(time.strptime(testTime, ENGINE_TIME_FORMAT)))
except:
print "Time to convert %s" % testTime
raise
I then tried to use getline to assign the conversion to a variable for comparison:
cat checkThis.txt| awk 'BEGIN {FS="\n"; RS=""; OFS=";"; ORS="\n"; "./firstDate '2010-03-23T12:56:47'" | getline start_time; close("firstDate"); "./firstDate '2010-03-23T13:56:47'" | getline end_time; close("firstDate");} ("./firstDate $1" | getline) > start_time {$1=$1 ""; print $0}'
Traceback (most recent call last):
File "./firstDate", line 4, in <module>
testTime = sys.argv[1]
IndexError: list index out of range
The getline works in the BEGIN and I checked it in the final print but I seem to have problems in the comparison part of the script.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
关键的观察结果是,您可以使用字母数字比较来比较时间戳并获得正确的答案 - 这就是 ISO 的优点8601 表示法。
因此,稍微调整您的代码 - 并格式化以避免滚动条:
显然,您可以将其放入脚本文件中 - 您不想经常键入它。而准确、方便地输入日期范围是难点之一。请注意,我已经调整了时间范围以匹配数据。
当运行示例数据时,它输出一条记录:
The key observation is that you can compare your timestamps using alphanumeric comparisons and get the correct answer - that is the beauty of ISO 8601 notation.
Thus, adapting your code slightly - and formatting to avoid scroll bars:
Obviously, you could put this into a script file - you wouldn't want to type it often. And getting the date range entered accurately and conveniently is one of the hard parts. Note that I've adjusted the time range to match the data.
When run on the sample data, it outputs one record:
有点拼凑,但是这个脚本假设您有 unix“date”命令。还在 BEGIN 块中硬编码了开始和结束时间戳。请注意,上面列出的测试数据不属于样本开始/结束时间范围。
A bit of a kludge, but this script assumes you have the unix "date" command. Also hard coded your start and end timestamps in the BEGIN block. Note that your test data listed above does not fall within your sample start/end times.