可以“显着”聚合与多场一起使用?

发布于 2025-02-09 22:31:01 字数 1324 浏览 1 评论 0原文

我在官方文档中找不到任何信息: https://www.elastic.co/guide/guide/en/elasticsearch/reference/current/current/current/search-aggregations-bucket-significantificantificantificanttext-aggregation.html

在应用Shaningles filter/Analyzer的“多场”(name.shingles)上:

  "aggregations": {
    "significant_words": {
      "sampler": {
        "shard_size": 100
      }, 
      "aggs": {
        "keywords": {
          "significant_text": {
            "field": "name.shingles"
          }
        }
      }
    }
  }

我得到空的存储桶:

  "aggregations" : {
    "significant_words" : {
      "doc_count" : 5,
      "keywords" : {
        "doc_count" : 5,
        "bg_count" : 153313,
        "buckets" : [ ]
      }
    }
  }

多场定义:

        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword"
            },
            "shingles" : {
              "type" : "text",
              "analyzer" : "shingle_analyzer",
              "fielddata" : true
            }
          }

I can't find any info in official documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significanttext-aggregation.html

The issue is that I try to aggregate significant terms on a "multi-field" (name.shingles) that has applied shingles filter/analyzer:

  "aggregations": {
    "significant_words": {
      "sampler": {
        "shard_size": 100
      }, 
      "aggs": {
        "keywords": {
          "significant_text": {
            "field": "name.shingles"
          }
        }
      }
    }
  }

I'm getting empty buckets:

  "aggregations" : {
    "significant_words" : {
      "doc_count" : 5,
      "keywords" : {
        "doc_count" : 5,
        "bg_count" : 153313,
        "buckets" : [ ]
      }
    }
  }

Multi-field definition:

        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword"
            },
            "shingles" : {
              "type" : "text",
              "analyzer" : "shingle_analyzer",
              "fielddata" : true
            }
          }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

ぺ禁宫浮华殁 2025-02-16 22:31:01

您也可以在多场中使用重要的术语或重要的文本聚合。

但是您需要了解此聚合的工作原理,您可以检查文档,以下是其中提到的。

返回有趣或不寻常出现的聚合
一组中的术语。

在所有这些情况下,所选择的术语不仅是最重要的
一组流行的术语。他们是经历过的条款
前景和
背景集。如果“ H5N1”一词仅在10个文档中存在
百万文件索引,但在100个文档中有4个已找到
构成用户的搜索结果很重要,而且可能非常
与他们的搜索有关。 5/10,000,000 vs 4/100是一个很大的摇摆
频率。

另外,在查看汇总结果“ doc_count”之后,您似乎在索引中的文档数量较少:5。您可以索引大型文档集,然后可以尝试此汇总。

如果您想在木板字段上应用聚合,则也可以使用术语聚合:

{
  "size": 0,
  "aggs": {
    "sw": {
     "terms": {
       "field": "name.shingles",
       "size": 10
     }
    }
  }
}

You can use Significant Terms or Significant Text aggregation with multi-field as well.

But you need to understand how this aggregation work, You can check the documentation and below is what mentioned in it.

An aggregation that returns interesting or unusual occurrences of
terms in a set.

In all these cases the terms being selected are not simply the most
popular terms in a set. They are the terms that have undergone a
significant change in popularity measured between a foreground and
background set. If the term "H5N1" only exists in 5 documents in a 10
million document index and yet is found in 4 of the 100 documents that
make up a user’s search results that is significant and probably very
relevant to their search. 5/10,000,000 vs 4/100 is a big swing in
frequency.

Also, It seems like you have very less number of document in index after looking your aggregation result "doc_count" : 5. You can index large document set and then you can try it out this aggregation.

If you want to apply aggregation on shingle field then you can used terms aggregation as well:

{
  "size": 0,
  "aggs": {
    "sw": {
     "terms": {
       "field": "name.shingles",
       "size": 10
     }
    }
  }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文