当前位置：文江博客话题详情

同义词如何在elasticsearch中内部工作？

发布于 2025-01-22 10:32:28 字数 128 浏览 0 评论 0原文

不久前，我遇到了Elasticsearch，并开始探索它。我知道同义词功能太棒了！有人可以解释整个同义词过程的内部如何工作吗？分析和搜索时间同义词分析的索引时间同义词在内部结构方面有何不同？

谢谢：）

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

标点 2025-01-29 10:32:28

弹性文档：

通常，在索引时间和
查找时间以确保查询项的格式与
反向索引术语。

当您使用search_analyzer同义词时，您将在搜索时间内生成搜索词的同义词令牌。

当您在索引时间使用同义词时，您将术语扩展到同义词的其他术语，也就是说，倒置索引中的所有内容都存在。当您索引更多术语时，这可能会减少存储空间。

iNdextime示例：

PUT synonym_index_time
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "synonyms_filter"
            ]
          }
        },
        "filter": {
          "synonyms_filter": {
            "type": "synonym",
            "lenient": true,
            "synonyms": [
              "laptop, notebook"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "synonym_analyzer"
      }
    }
  }
}

测试：

GET synonym_index_time/_analyze
{
  "field": "name",
  "text": ["laptop"]
}

结果：

{
  "tokens" : [
    {
      "token" : "laptop",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "notebook",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "SYNONYM",
      "position" : 0
    }
  ]
}

看，笔记本电脑和笔记本的术语已被索引，但是笔记本是同义词。

Elastic Doc:

Typically, the same parser should be applied at both index time and
lookup time to ensure that the query terms are in the same format as
the inverted index terms.

When you use the search_analyzer synonyms, you are generating the synonym tokens for the search term just in search time.

When you use synonyms at indexing time, you are expanding the term to the other terms of the synonyms, that is, everything is there in the inverted index. This can decrease your storage as you are indexing more term.

IndexTime example:

PUT synonym_index_time
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "synonyms_filter"
            ]
          }
        },
        "filter": {
          "synonyms_filter": {
            "type": "synonym",
            "lenient": true,
            "synonyms": [
              "laptop, notebook"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "synonym_analyzer"
      }
    }
  }
}

Test:

GET synonym_index_time/_analyze
{
  "field": "name",
  "text": ["laptop"]
}

Results:

{
  "tokens" : [
    {
      "token" : "laptop",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "notebook",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "SYNONYM",
      "position" : 0
    }
  ]
}

Look, the terms laptop and notebook have been indexed, but notebook is a synonym.

回复收藏 0 原文

~没有更多了~