关键字类型如何在弹性搜索中存储和分析

发布于 2025-02-11 07:07:53 字数 933 浏览 1 评论 0原文

根据我的理解,关键字类型将不会被分析并作为确切的术语存储。例如,“关闭”将被存储在弹性搜索中的“关闭”中,而文本类型将使用默认或自定义分析仪(如果指定)进行分析,并将“关闭”分开为[关闭,向下]分为两个单词和将其存储在ES中。这也适用于搜索。为了搜索关键字类型的字段,我们必须搜索其确切的术语,而搜索文本类型字段,我们可以搜索实际文本中存在的任何一个或多个术语。

我有一个名为sample_index的索引,该索引具有两个字段 - 类型关键字的描述和类型文本的消息

这是索引的映射,名为sample_index

查询

POST sample_index/_search
{
  "query": {
    "query_string": {
      "query": "keyword"
    }
  }
}

这是上述查询的输出:

在这里您可以在搜索描述字段中存在的单词“关键字”(关键字类型)中,结果显示。但是根据我的理解,这是不可能的吧?因为对于关键字类型,整个文本的索引都没有被分开。我的理解怎么可能是可能的?

ES版本:5.6.4

As per my understanding, Keyword type will not be analyzed and be stored as an exact term. For example "shut down" will get stored as "shut down" in elastic search whereas text type will analyze using default or custom analyzer(if specified) and it'll separate "shut down" as [shut , down] into two words and store it in ES. This applies for searching also. For searching a field which is of keyword type we have to search for it's exact term while for search a field of text type we can search any one or more of the terms present in the actual text.

I've an index named sample_index which has two fields - description of type keyword and message of type text

This is the mapping of the index named sample_index
enter image description here

Query

POST sample_index/_search
{
  "query": {
    "query_string": {
      "query": "keyword"
    }
  }
}

This is the output of the above query:
enter image description here

Here you can see that upon searching the word "keyword" which is present in description field (which is of keyword type), the results show up. But as per my understanding this is not possible right? because for keyword type , the whole text get's indexed as it is without getting split. How can this be possible or is something wrong with my understanding?

ES Version: 5.6.4

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

暮凉 2025-02-18 07:07:53

tldr;

在版本5.6使用 query_string

如果未选择default_field,它将转到_ALL字段。

_所有字段文档中的所有字段。

_ALL字段是一个特殊的接收场字段,将所有其他字段的值串联成一个大字符串,然后使用空间作为定界符,然后将其分析和索引,但不存储。这意味着可以搜索但不能检索。

这就是为什么你有这样的结果

Tldr;

In version 5.6 when using a query_string.

If no default_field is selected, it will turn to _all field.

The _all field is a concatenation of all the fields in the document.

The _all field is a special catch-all field which concatenates the values of all of the other fields into one big string, using space as a delimiter, which is then analyzed and indexed, but not stored. This means that it can be searched, but not retrieved.

This is why you have such a results

月下凄凉 2025-02-18 07:07:53

您的消息实际上是类型文本
此外,您还有一个类型关键字的字段 message ,但与您的搜索查询无关。

因为您正在使用query_string,默认情况下在所有字段上搜索,因此您的搜索查询将与您的Message> Message> Message> type text> text> text 。这就是为什么您能够搜索“关键字”一词的原因,因为text类型被分析。

来自 query_string 文档

default_field默认为index.query.default_field index设置,其默认值为 *。

fields 文档

以不同的方式以不同的方式索引相同的字段通常是有用的。这是多场的目的。例如,可以将字符串字段映射为用于全文搜索的文本字段,以及用于排序或聚合的关键字字段:

Your message is actually of type text.
Additionally, you have a field of type keyword on message, but not relevant for your search query.

Because you're using query_string, which searches on all fields by default, your search query will match the "keyword" word in your message of type text. This is why you're able to search for the word "keyword", because text types get analyzed.

From query_string documentation

default_field Defaults to the index.query.default_field index setting, which has a default value of *.

Fields documentation

It is often useful to index the same field in different ways for different purposes. This is the purpose of multi-fields. For instance, a string field could be mapped as a text field for full-text search, and as a keyword field for sorting or aggregations:

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文