AWS CloudWatch Log Insights:总计结果是不可能的(count -count_distinct是负)

发布于 2025-02-05 23:11:15 字数 1163 浏览 3 评论 0原文

我在单个日志流上运行CloudWatch Log Insights查询,该查询与单个Python AWS lambda函数相对应。此功能记录了与S3中的密钥相对应的唯一行。它在调用开始时登录一行。唯一无法记录此行的条件是它甚至在读取事件之前就失败了。

查询是:

parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta
        by datefloor(@timestamp, 1d) as @_datefloor 
| sort @_datefloor asc

此查询中的两个正则表达式将解析正在处理的S3文件的完整键。在这个特定的问题中,总的来说,我的理解是,任何数量的计数减去相同数量的count_distinct(...)的计数应始终大于或等于零。

在结果中的几天中,这是一个负数。

我以为我可能会误解datefloor()的正确用法,因此我尝试运行以下查询:

parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta

结果为-20,347。

在这一点上,我只能看到

  1. 执行查询的代码有问题。
  2. 我误解了这个工具。

I'm running a CloudWatch log insights query on a single log stream that corresponds to a single Python AWS Lambda function. This function logs a unique line corresponding to the key in s3 that it is processing. It logs this line once at the beginning of the invocation. The only condition where it won't log this line is if it fails before it even reads the event.

The query is:

parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta
        by datefloor(@timestamp, 1d) as @_datefloor 
| sort @_datefloor asc

The two regular expressions in this query will parse the full key of the s3 file being processed. In this particular problem and in general, my understanding is that the count(...) of any quantity minus the count_distinct(...) of the same quantity should always be greater than or equal to zero.

For several of the days in the results, it is a negative number.

I thought I might be misunderstanding the correct usage of datefloor(), so I tried running the following query:

parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta

The result was -20,347.

At this point the only scenarios I can see are

  1. Something wrong with the code executing the query.
  2. I'm misunderstanding this tool.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

断念 2025-02-12 23:11:16

我发现count_distinct aws log Insights查询中的函数并没有真正返回独特的计数!根据文档

返回该字段的唯一值的数量。如果该字段具有很高的基数(包含许多唯一值),则count_distinct返回的值只是一个近似值。

显然,我不能只是假设函数返回准确的结果。

文档页面

I have discovered that the count_distinct function in AWS Log Insights queries doesn't really return a distinct count! As per the documentation

Returns the number of unique values for the field. If the field has very high cardinality (contains many unique values), the value returned by count_distinct is just an approximation.

Apparently I can't just assume that a function returns an accurate result.

The documentation page.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文