amazon-web-services amazon-cloudwatch aws-cloudwatch-log-insights

AWS CloudWatch Log Insights：总计结果是不可能的（count -count_distinct是负）

发布于 2025-02-05 23:11:15 字数 1163 浏览 3 评论 0原文

我在单个日志流上运行CloudWatch Log Insights查询，该查询与单个Python AWS lambda函数相对应。此功能记录了与S3中的密钥相对应的唯一行。它在调用开始时登录一行。唯一无法记录此行的条件是它甚至在读取事件之前就失败了。

查询是：

parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta
        by datefloor(@timestamp, 1d) as @_datefloor 
| sort @_datefloor asc

此查询中的两个正则表达式将解析正在处理的S3文件的完整键。在这个特定的问题中，总的来说，我的理解是，任何数量的计数减去相同数量的count_distinct（...）的计数应始终大于或等于零。

在结果中的几天中，这是一个负数。

我以为我可能会误解datefloor（）的正确用法，因此我尝试运行以下查询：

parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta

结果为-20,347。

在这一点上，我只能看到

执行查询的代码有问题。
我误解了这个工具。

原文

I'm running a CloudWatch log insights query on a single log stream that corresponds to a single Python AWS Lambda function. This function logs a unique line corresponding to the key in s3 that it is processing. It logs this line once at the beginning of the invocation. The only condition where it won't log this line is if it fails before it even reads the event.

The query is:

parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta
        by datefloor(@timestamp, 1d) as @_datefloor 
| sort @_datefloor asc

The two regular expressions in this query will parse the full key of the s3 file being processed. In this particular problem and in general, my understanding is that the count(...) of any quantity minus the count_distinct(...) of the same quantity should always be greater than or equal to zero.

For several of the days in the results, it is a negative number.

I thought I might be misunderstanding the correct usage of datefloor(), so I tried running the following query:

parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta

The result was -20,347.

At this point the only scenarios I can see are