Elasticsearch 数组仅包含查询

发布于 2025-01-11 15:05:55 字数 356 浏览 0 评论 0原文

假设我有这种格式的数据:

{
  "id": "doc0",
  "tags": ["a", "b", "c"]
}

{
  "id": "doc1",
  "tags": ["a", "b", "c", "d"]
}

{
  "id": "doc2",
  "tags": ["a", "b"]
}

我需要形成一个 ES 查询,该查询仅获取包含“a”、“b”且不包含其他内容的文档。

如果我编写一个术语查询,它会匹配所有文档,因为所有文档都有“a”和“b”,但只有一个文档除了“a”和“b”之外没有其他任何内容,

形成此查询的最佳方式是什么?我没有添加“not_contains”子句的其他值的列表。

Let's say I've data in this format:

{
  "id": "doc0",
  "tags": ["a", "b", "c"]
}

{
  "id": "doc1",
  "tags": ["a", "b", "c", "d"]
}

{
  "id": "doc2",
  "tags": ["a", "b"]
}

I need to form an ES query that fetches only documents that contains both "a", "b" and nothing else.

If I write a terms query, it matches all the documents, as all documents have both "a" and "b" but only one document has nothing else apart from "a" and "b"

What is the best way to form this query? I don't have the list of the other values to add "not_contains" clause.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

朕就是辣么酷 2025-01-18 15:05:55

有两种方法可以实现您的结果:

  1. 您可以使用 bool 查询(带有 mustfilter 子句)和脚本查询,仅检索同时具有“a ”和“b”。

索引数据:

POST testidx/_doc/1
{
  "id": "doc0",
  "tags": ["a", "b", "c"]
}

POST testidx/_doc/2
{
  "id": "doc1",
  "tags": ["a", "b", "c", "d"]
}

POST testidx/_doc/3
{
  "id": "doc2",
  "tags": ["a", "b"]
}

搜索查询:

POST testidx/_search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "tags": "a"
              }
            },
            {
              "term": {
                "tags": "b"
              }
            },
            {
              "script": {
                "script": {
                  "source": "if(params.input.containsAll(doc['tags.keyword'])){return true;}",
                  "lang": "painless",
                  "params": {
                    "input": [
                      "a",
                      "b"
                    ]
                  }
                }
              }
            }
          ]
        }
      }
    }
  }
}

搜索结果:

"hits" : [
      {
        "_index" : "testidx",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.0,
        "_source" : {
          "id" : "doc2",
          "tags" : [
            "a",
            "b"
          ]
        }
      }
    ]
  1. 您可以使用带有术语集的minimum_should_match_script 参数查询。与脚本查询相比,术语集查询会更快。

输入图片此处描述

POST testidx/_search
{
  "query": {
    "bool": {
      "filter": {
        "terms_set": {
          "tags": {
            "terms": [
              "a",
              "b"
            ],
            "minimum_should_match_script": {
              "source": "doc['tags.keyword'].size()"
            }
          }
        }
      }
    }
  }
}

There are two ways in which you can achieve your result :

  1. You can use a combination of bool query(with must and filter clause) and script query to retrieve only those documents that have both "a" and "b".

Index Data:

POST testidx/_doc/1
{
  "id": "doc0",
  "tags": ["a", "b", "c"]
}

POST testidx/_doc/2
{
  "id": "doc1",
  "tags": ["a", "b", "c", "d"]
}

POST testidx/_doc/3
{
  "id": "doc2",
  "tags": ["a", "b"]
}

Search Query:

POST testidx/_search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "tags": "a"
              }
            },
            {
              "term": {
                "tags": "b"
              }
            },
            {
              "script": {
                "script": {
                  "source": "if(params.input.containsAll(doc['tags.keyword'])){return true;}",
                  "lang": "painless",
                  "params": {
                    "input": [
                      "a",
                      "b"
                    ]
                  }
                }
              }
            }
          ]
        }
      }
    }
  }
}

Search Result:

"hits" : [
      {
        "_index" : "testidx",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.0,
        "_source" : {
          "id" : "doc2",
          "tags" : [
            "a",
            "b"
          ]
        }
      }
    ]
  1. You can use minimum_should_match_script param with terms set query. When compared to a script query, Terms set query will be faster.

enter image description here

POST testidx/_search
{
  "query": {
    "bool": {
      "filter": {
        "terms_set": {
          "tags": {
            "terms": [
              "a",
              "b"
            ],
            "minimum_should_match_script": {
              "source": "doc['tags.keyword'].size()"
            }
          }
        }
      }
    }
  }
}
掩于岁月 2025-01-18 15:05:55

您可以使用 条款集< /a> 查询。

在使用团队集查询之前,您需要使用一个字段中的元素计数更新索引文档。

PUT sample1/_doc/1
{
 "id": "doc0",
  "tags": ["a", "b", "c"],
  "required_matches": 3
}
PUT sample1/_doc/2
{
  "id": "doc1",
  "tags": ["a","b","c","d"],
  "required_matches": 4
}
PUT sample1/_doc/3
{
  "id": "doc2",
  "tags": ["a","b"],
  "required_matches": 2
}

查询:

POST sample1/_search
{
  "query": {
    "terms_set": {
      "tags": {
        "terms": [ "a", "b"],
        "minimum_should_match_field": "required_matches"
      }
    }
  }
}

结果:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.17161848,
    "hits" : [
      {
        "_index" : "sample1",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.17161848,
        "_source" : {
          "id" : "doc2",
          "tags" : [
            "a",
            "b"
          ],
          "required_matches" : 2
        }
      }
    ]
  }
}

You can use Terms Set query.

Before using teams set query, you need to update your index document with number of elements count in one field.

PUT sample1/_doc/1
{
 "id": "doc0",
  "tags": ["a", "b", "c"],
  "required_matches": 3
}
PUT sample1/_doc/2
{
  "id": "doc1",
  "tags": ["a","b","c","d"],
  "required_matches": 4
}
PUT sample1/_doc/3
{
  "id": "doc2",
  "tags": ["a","b"],
  "required_matches": 2
}

Query:

POST sample1/_search
{
  "query": {
    "terms_set": {
      "tags": {
        "terms": [ "a", "b"],
        "minimum_should_match_field": "required_matches"
      }
    }
  }
}

Result:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.17161848,
    "hits" : [
      {
        "_index" : "sample1",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.17161848,
        "_source" : {
          "id" : "doc2",
          "tags" : [
            "a",
            "b"
          ],
          "required_matches" : 2
        }
      }
    ]
  }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文