为什么Elasticsearch查询需要一定数量的字符才能返回结果？

发布于 2025-01-22 23:43:33 字数 620 浏览 0 评论 0原文

似乎需要最低限度的角色才能获得我正在搜索的特定属性的Elasticsearch结果。它称为“ GUID”，并具有以下配置：

    "guid": {
        "type": "text",
        "fields": {
            "keyword": {
                "type": "keyword",
                "ignore_above": 256
            }
        }
    }

我有一个带有以下GUID的文档：3E49996C-1DD8-4230-8F6F-ABE4236A6FC4

以下查询以下查询返回文档，但

{"match":{"query":"9996c-1dd8*","fields":["guid"]}}

此查询是：但是不是：

{"match":{"query":"9996c-1dd*","fields":["guid"]}}

我在multi_match和query_string查询中具有相同的结果。我在文档中找不到有关角色最小的任何内容，那么这里发生了什么？

原文

It seems like there is a character minimum needed to get results with elasticsearch for a specific property I am searching. It is called 'guid' and has the following configuration:

    "guid": {
        "type": "text",
        "fields": {
            "keyword": {
                "type": "keyword",
                "ignore_above": 256
            }
        }
    }

I have a document with the following GUID: 3e49996c-1dd8-4230-8f6f-abe4236a6fc4

The following query returns the document as-expected:

{"match":{"query":"9996c-1dd8*","fields":["guid"]}}

However this query does not:

{"match":{"query":"9996c-1dd*","fields":["guid"]}}

I have the same result with multi_match and query_string queries. I haven't been able to find anything in the documentation about a character minimum, so what is happening here?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

坦然微笑 2025-01-29 23:43:33

弹性不需要最小数量的字符。重要的是生成的令牌。

有助于理解的练习是使用_analyzer查看您的索引令牌。

GET index_001/_analyze
{
  "field": "guid",
  "text": [
    "3e49996c-1dd8-4230-8f6f-abe4236a6fc4"
  ]
}

您指出术语3E49996C-1DD8-4230-8F6F-ABE4236A6FC4。
查看令牌的方式：

 "tokens" : [
    {
      "token" : "3e49996c",
      "start_offset" : 0,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "1dd8",
      "start_offset" : 9,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "4230",
      "start_offset" : 14,
      "end_offset" : 18,
      "type" : "<NUM>",
      "position" : 2
    },
    {
      "token" : "8f6f",
      "start_offset" : 19,
      "end_offset" : 23,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "abe4236a6fc4",
      "start_offset" : 24,
      "end_offset" : 36,
      "type" : "<ALPHANUM>",
      "position" : 4
    }
  ]

执行搜索时，搜索中使用的索引中使用的分析器将使用。
当您搜索“ 9996C-1DD8*”一词时。

GET index_001/_analyze
{
  "field": "guid",
  "text": [
    "9996c-1dd8*"
  ]
}

生成的令牌是：

{
  "tokens" : [
    {
      "token" : "9996c",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "1dd8",
      "start_offset" : 6,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

请注意，倒置索引将具有令牌1DD8，“ 9996C-1DD8*”一词生成了令牌“ 1DD8”，因此进行了匹配。

当您使用“ 9996C-1DD*”一词测试时，请勿匹配令牌，因此没有结果。

GET index_001/_analyze
{
  "field": "guid",
  "text": [
    "9996c-1dd*"
  ]
}

令牌：

{
  "tokens" : [
    {
      "token" : "9996c",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "1dd",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

令牌“ 1DD”不等于“ 1dd8”。

Elastic does not require a minimum number of characters. What matters is the generated token.

An exercise that helps to understand is to use _analyzer to see your index tokens.

GET index_001/_analyze
{
  "field": "guid",
  "text": [
    "3e49996c-1dd8-4230-8f6f-abe4236a6fc4"
  ]
}

You indicate the term 3e49996c-1dd8-4230-8f6f-abe4236a6fc4.
Look how the tokens are:

 "tokens" : [
    {
      "token" : "3e49996c",
      "start_offset" : 0,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "1dd8",
      "start_offset" : 9,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "4230",
      "start_offset" : 14,
      "end_offset" : 18,
      "type" : "<NUM>",
      "position" : 2
    },
    {
      "token" : "8f6f",
      "start_offset" : 19,
      "end_offset" : 23,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "abe4236a6fc4",
      "start_offset" : 24,
      "end_offset" : 36,
      "type" : "<ALPHANUM>",
      "position" : 4
    }
  ]

When you perform the search, the same analyzer that is used in the indexing will be used in the search.
When you search for the term "9996c-1dd8*".

GET index_001/_analyze
{
  "field": "guid",
  "text": [
    "9996c-1dd8*"
  ]
}

The generated tokens are:

{
  "tokens" : [
    {
      "token" : "9996c",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "1dd8",
      "start_offset" : 6,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

Note that the inverted index will have the token 1dd8 and the term "9996c-1dd8*" generated the token "1dd8" so the match took place.

When you test with the term "9996c-1dd*", no tokens match, so there are no results.

GET index_001/_analyze
{
  "field": "guid",
  "text": [
    "9996c-1dd*"
  ]
}

Tokens:

{
  "tokens" : [
    {
      "token" : "9996c",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "1dd",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

Token "1dd" is not equal to "1dd8".

回复收藏 0 原文

~没有更多了~