Elasticsearch日期直方图聚合结果不匹配

发布于 2025-01-21 15:44:56 字数 3087 浏览 4 评论 0原文

我已经从弹性搜索索引中获取了用户计数。相同的查询,但有不同的直方图间隔类型,例如日,月,周,季度和年度计数不正确地匹配

:总的来说,我今年只有4个月的数据,

这是一个月间隔ES搜索查询

{
     "query": {
            "range": {
              "@timestamp": {
                "gte": "2022-01-01",
                "lte": "2022-04-14"
              }
            }}, 
  "aggs": {
    "dt": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "month",
        "format": "yyyy-MM-dd"
         
      },          
  "aggs": {
    "events": {
      "nested": {
        "path": "events"
      },
 
  "aggs": {
    "unique_user_count": {
      "cardinality": {
        "field": "events.actor.id.keyword"
      }
    }
  }
}}}}}
  

以下是低于月份的结果(响应)

{
  "aggregations": {
    "dt": {
      "buckets": [
        {
          "key_as_string": "2022-01-01",
          "key": 1640995200000,
          "doc_count": 2930,
          "events": {
            "doc_count": 13988,
            "unique_user_count": {
              "value": 37
            }
          }
        },
        {
          "key_as_string": "2022-02-01",
          "key": 1643673600000,
          "doc_count": 36910,
          "events": {
            "doc_count": 175151,
            "unique_user_count": {
              "value": 580
            }
          }
        },
        {
          "key_as_string": "2022-03-01",
          "key": 1646092800000,
          "doc_count": 24861,
          "events": {
            "doc_count": 133383,
            "unique_user_count": {
              "value": 555
            }
          }
        },
        {
          "key_as_string": "2022-04-01",
          "key": 1648771200000,
          "doc_count": 6005,
          "events": {
            "doc_count": 30730,
            "unique_user_count": {
              "value": 170
            }
          }
        }
      ]
    }
  }
}

我再次运行相同的查询,但更改间隔=年,

{
         "query": {
                "range": {
                  "@timestamp": {
                    "gte": "2022-01-01",
                    "lte": "2022-04-14"
                  }
                }}, 
      "aggs": {
        "dt": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": "year",
            "format": "yyyy-MM-dd"
             
          },          
      "aggs": {
        "events": {
          "nested": {
            "path": "events"
          },
     
      "aggs": {
        "unique_user_count": {
          "cardinality": {
            "field": "events.actor.id.keyword"
          }
        }
      }
    }}}}}

   

我得到了低于一年的回复

{
  "aggregations": {
    "dt": {
      "buckets": [
        {
          "key_as_string": "2022-01-01",
          "key": 1640995200000,
          "doc_count": 70706,
          "events": {
            "doc_count": 353252,
            "unique_user_count": {
              "value": 1007
            }
          }
        }
      ]
    }
  }
}

我的预期结果 年= 37+580+555+170 年= 1342 ---->但是我得到了“ 1007”错误的值,

如何匹配总和(月)的价值和年度值?

I have taken the user count from the elastic search index. same query but different histogram interval types like Day, Month, Week, Quarter, and year the counts are not matching correctly

note: totally I have only 4 months of data for this year

This is for month interval ES search query

{
     "query": {
            "range": {
              "@timestamp": {
                "gte": "2022-01-01",
                "lte": "2022-04-14"
              }
            }}, 
  "aggs": {
    "dt": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "month",
        "format": "yyyy-MM-dd"
         
      },          
  "aggs": {
    "events": {
      "nested": {
        "path": "events"
      },
 
  "aggs": {
    "unique_user_count": {
      "cardinality": {
        "field": "events.actor.id.keyword"
      }
    }
  }
}}}}}
  

got below Month results (response)

{
  "aggregations": {
    "dt": {
      "buckets": [
        {
          "key_as_string": "2022-01-01",
          "key": 1640995200000,
          "doc_count": 2930,
          "events": {
            "doc_count": 13988,
            "unique_user_count": {
              "value": 37
            }
          }
        },
        {
          "key_as_string": "2022-02-01",
          "key": 1643673600000,
          "doc_count": 36910,
          "events": {
            "doc_count": 175151,
            "unique_user_count": {
              "value": 580
            }
          }
        },
        {
          "key_as_string": "2022-03-01",
          "key": 1646092800000,
          "doc_count": 24861,
          "events": {
            "doc_count": 133383,
            "unique_user_count": {
              "value": 555
            }
          }
        },
        {
          "key_as_string": "2022-04-01",
          "key": 1648771200000,
          "doc_count": 6005,
          "events": {
            "doc_count": 30730,
            "unique_user_count": {
              "value": 170
            }
          }
        }
      ]
    }
  }
}

Again I run the same query but changed intervals = Year

{
         "query": {
                "range": {
                  "@timestamp": {
                    "gte": "2022-01-01",
                    "lte": "2022-04-14"
                  }
                }}, 
      "aggs": {
        "dt": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": "year",
            "format": "yyyy-MM-dd"
             
          },          
      "aggs": {
        "events": {
          "nested": {
            "path": "events"
          },
     
      "aggs": {
        "unique_user_count": {
          "cardinality": {
            "field": "events.actor.id.keyword"
          }
        }
      }
    }}}}}

   

I got the below Year response

{
  "aggregations": {
    "dt": {
      "buckets": [
        {
          "key_as_string": "2022-01-01",
          "key": 1640995200000,
          "doc_count": 70706,
          "events": {
            "doc_count": 353252,
            "unique_user_count": {
              "value": 1007
            }
          }
        }
      ]
    }
  }
}

My Expected results like this
year = 37+580+555+170
year = 1342 ----> but I got "1007" wrong values

how do match the sum(month) value and year values?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

你怎么这么可爱啊 2025-01-28 15:44:56

在每月的水桶中,您正在运行基数聚合,以每月获得独特的用户数量。

如果您在一年中运行相同的聚合,则独特的用户数量不能是每月用户数量的总和,因为给定的用户可能在几个月内进行了交互。

如果比较总数数量,它们确实匹配:13988 + 175151 + 133383 + 30730 = 353252,

所以一切都很好,您只需要将苹果与苹果进行比较

In your monthly buckets you're running a cardinality aggregation to get the unique user count per month.

If you're running the same aggregation over a year, the unique user count cannot be the sum of the monthly user count because a given user might have had interactions during several months.

If you compare the total events count they do match: 13988 + 175151 + 133383 + 30730 = 353252

So all is fine, you just need to compare apples to apples

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文