为什么Elasticsearch查询需要一定数量的字符才能返回结果?

发布于 2025-01-22 23:43:33 字数 620 浏览 0 评论 0原文

似乎需要最低限度的角色才能获得我正在搜索的特定属性的Elasticsearch结果。它称为“ GUID”,并具有以下配置:

    "guid": {
        "type": "text",
        "fields": {
            "keyword": {
                "type": "keyword",
                "ignore_above": 256
            }
        }
    }

我有一个带有以下GUID的文档:3E49996C-1DD8-4230-8F6F-ABE4236A6FC4

以下查询以下查询返回文档,但

{"match":{"query":"9996c-1dd8*","fields":["guid"]}}

此查询是:但是不是:

{"match":{"query":"9996c-1dd*","fields":["guid"]}}

我在multi_match和query_string查询中具有相同的结果。我在文档中找不到有关角色最小的任何内容,那么这里发生了什么?

It seems like there is a character minimum needed to get results with elasticsearch for a specific property I am searching. It is called 'guid' and has the following configuration:

    "guid": {
        "type": "text",
        "fields": {
            "keyword": {
                "type": "keyword",
                "ignore_above": 256
            }
        }
    }

I have a document with the following GUID: 3e49996c-1dd8-4230-8f6f-abe4236a6fc4

The following query returns the document as-expected:

{"match":{"query":"9996c-1dd8*","fields":["guid"]}}

However this query does not:

{"match":{"query":"9996c-1dd*","fields":["guid"]}}

I have the same result with multi_match and query_string queries. I haven't been able to find anything in the documentation about a character minimum, so what is happening here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

坦然微笑 2025-01-29 23:43:33

弹性不需要最小数量的字符。重要的是生成的令牌。

有助于理解的练习是使用_analyzer查看您的索引令牌。

GET index_001/_analyze
{
  "field": "guid",
  "text": [
    "3e49996c-1dd8-4230-8f6f-abe4236a6fc4"
  ]
}

您指出术语3E49996C-1DD8-4230-8F6F-ABE4236A6FC4。
查看令牌的方式:

 "tokens" : [
    {
      "token" : "3e49996c",
      "start_offset" : 0,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "1dd8",
      "start_offset" : 9,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "4230",
      "start_offset" : 14,
      "end_offset" : 18,
      "type" : "<NUM>",
      "position" : 2
    },
    {
      "token" : "8f6f",
      "start_offset" : 19,
      "end_offset" : 23,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "abe4236a6fc4",
      "start_offset" : 24,
      "end_offset" : 36,
      "type" : "<ALPHANUM>",
      "position" : 4
    }
  ]

执行搜索时,搜索中使用的索引中使用的分析器将使用。
当您搜索“ 9996C-1DD8*”一词时。

GET index_001/_analyze
{
  "field": "guid",
  "text": [
    "9996c-1dd8*"
  ]
}

生成的令牌是:

{
  "tokens" : [
    {
      "token" : "9996c",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "1dd8",
      "start_offset" : 6,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

请注意,倒置索引将具有令牌1DD8,“ 9996C-1DD8*”一词生成了令牌“ 1DD8”,因此进行了匹配。

当您使用“ 9996C-1DD*”一词测试时,请勿匹配令牌,因此没有结果。

GET index_001/_analyze
{
  "field": "guid",
  "text": [
    "9996c-1dd*"
  ]
}

令牌:

{
  "tokens" : [
    {
      "token" : "9996c",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "1dd",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

令牌“ 1DD”不等于“ 1dd8”。

Elastic does not require a minimum number of characters. What matters is the generated token.

An exercise that helps to understand is to use _analyzer to see your index tokens.

GET index_001/_analyze
{
  "field": "guid",
  "text": [
    "3e49996c-1dd8-4230-8f6f-abe4236a6fc4"
  ]
}

You indicate the term 3e49996c-1dd8-4230-8f6f-abe4236a6fc4.
Look how the tokens are:

 "tokens" : [
    {
      "token" : "3e49996c",
      "start_offset" : 0,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "1dd8",
      "start_offset" : 9,
      "end_offset" : 13,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "4230",
      "start_offset" : 14,
      "end_offset" : 18,
      "type" : "<NUM>",
      "position" : 2
    },
    {
      "token" : "8f6f",
      "start_offset" : 19,
      "end_offset" : 23,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "abe4236a6fc4",
      "start_offset" : 24,
      "end_offset" : 36,
      "type" : "<ALPHANUM>",
      "position" : 4
    }
  ]

When you perform the search, the same analyzer that is used in the indexing will be used in the search.
When you search for the term "9996c-1dd8*".

GET index_001/_analyze
{
  "field": "guid",
  "text": [
    "9996c-1dd8*"
  ]
}

The generated tokens are:

{
  "tokens" : [
    {
      "token" : "9996c",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "1dd8",
      "start_offset" : 6,
      "end_offset" : 10,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

Note that the inverted index will have the token 1dd8 and the term "9996c-1dd8*" generated the token "1dd8" so the match took place.

When you test with the term "9996c-1dd*", no tokens match, so there are no results.

GET index_001/_analyze
{
  "field": "guid",
  "text": [
    "9996c-1dd*"
  ]
}

Tokens:

{
  "tokens" : [
    {
      "token" : "9996c",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "1dd",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

Token "1dd" is not equal to "1dd8".

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文