同义词如何在elasticsearch中内部工作?

发布于 2025-01-22 10:32:28 字数 128 浏览 0 评论 0原文

不久前,我遇到了Elasticsearch,并开始探索它。我知道同义词功能太棒了!有人可以解释整个同义词过程的内部如何工作吗?分析和搜索时间同义词分析的索引时间同义词在内部结构方面有何不同?

谢谢 :)

I came across with Elasticsearch some time ago and started exploring it. I got to know about synonyms feature which is amazing! Can someone explain how internally this whole synonyms process work? How index time synonyms analyzing and search time synonyms analyzing are different in terms of internal structure?

Thanks :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

标点 2025-01-29 10:32:28

弹性文档:

通常,在索引时间和
查找时间以确保查询项的格式与
反向索引术语。

当您使用search_analyzer同义词时,您将在搜索时间内生成搜索词的同义词令牌。

当您在索引时间使用同义词时,您将术语扩展到同义词的其他术语,也就是说,倒置索引中的所有内容都存在。当您索引更多术语时,这可能会减少存储空间。

iNdextime示例:

PUT synonym_index_time
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "synonyms_filter"
            ]
          }
        },
        "filter": {
          "synonyms_filter": {
            "type": "synonym",
            "lenient": true,
            "synonyms": [
              "laptop, notebook"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "synonym_analyzer"
      }
    }
  }
}

测试:

GET synonym_index_time/_analyze
{
  "field": "name",
  "text": ["laptop"]
}

结果:

{
  "tokens" : [
    {
      "token" : "laptop",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "notebook",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "SYNONYM",
      "position" : 0
    }
  ]
}

看,笔记本电脑和笔记本的术语已被索引,但是笔记本是同义词。

Elastic Doc:

Typically, the same parser should be applied at both index time and
lookup time to ensure that the query terms are in the same format as
the inverted index terms.

When you use the search_analyzer synonyms, you are generating the synonym tokens for the search term just in search time.

When you use synonyms at indexing time, you are expanding the term to the other terms of the synonyms, that is, everything is there in the inverted index. This can decrease your storage as you are indexing more term.

IndexTime example:

PUT synonym_index_time
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "synonyms_filter"
            ]
          }
        },
        "filter": {
          "synonyms_filter": {
            "type": "synonym",
            "lenient": true,
            "synonyms": [
              "laptop, notebook"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "synonym_analyzer"
      }
    }
  }
}

Test:

GET synonym_index_time/_analyze
{
  "field": "name",
  "text": ["laptop"]
}

Results:

{
  "tokens" : [
    {
      "token" : "laptop",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "notebook",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "SYNONYM",
      "position" : 0
    }
  ]
}

Look, the terms laptop and notebook have been indexed, but notebook is a synonym.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文