AWS CloudWatch Log Insights:总计结果是不可能的(count -count_distinct是负)
我在单个日志流上运行CloudWatch Log Insights查询,该查询与单个Python AWS lambda函数相对应。此功能记录了与S3中的密钥相对应的唯一行。它在调用开始时登录一行。唯一无法记录此行的条件是它甚至在读取事件之前就失败了。
查询是:
parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta
by datefloor(@timestamp, 1d) as @_datefloor
| sort @_datefloor asc
此查询中的两个正则表达式将解析正在处理的S3文件的完整键。在这个特定的问题中,总的来说,我的理解是,任何数量的计数减去相同数量的count_distinct(...)的计数应始终大于或等于零。
在结果中的几天中,这是一个负数。
我以为我可能会误解datefloor()
的正确用法,因此我尝试运行以下查询:
parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta
结果为-20,347。
在这一点上,我只能看到
- 执行查询的代码有问题。
- 我误解了这个工具。
I'm running a CloudWatch log insights query on a single log stream that corresponds to a single Python AWS Lambda function. This function logs a unique line corresponding to the key in s3 that it is processing. It logs this line once at the beginning of the invocation. The only condition where it won't log this line is if it fails before it even reads the event.
The query is:
parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta
by datefloor(@timestamp, 1d) as @_datefloor
| sort @_datefloor asc
The two regular expressions in this query will parse the full key of the s3 file being processed. In this particular problem and in general, my understanding is that the count(...) of any quantity minus the count_distinct(...) of the same quantity should always be greater than or equal to zero.
For several of the days in the results, it is a negative number.
I thought I might be misunderstanding the correct usage of datefloor()
, so I tried running the following query:
parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta
The result was -20,347.
At this point the only scenarios I can see are
- Something wrong with the code executing the query.
- I'm misunderstanding this tool.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我发现
count_distinct
aws log Insights查询中的函数并没有真正返回独特的计数!根据文档显然,我不能只是假设函数返回准确的结果。
文档页面。
I have discovered that the
count_distinct
function in AWS Log Insights queries doesn't really return a distinct count! As per the documentationApparently I can't just assume that a function returns an accurate result.
The documentation page.