Elasticsearch:使用模糊搜索查找缩写

发布于 2025-02-10 07:01:50 字数 1494 浏览 2 评论 0原文

我已经索引了文本文章,其中提到了公司名称,例如apple 柠檬水,并且正在尝试使用其缩写搜索这些公司,例如appllmnd,但是模糊搜索给出了其他结果,例如,使用lmnd给出land,但未输出柠檬水我尝试过的任何参数。

第一个问题 模糊搜索是否适合进行此类搜索的解决方案?

第二个问题

是支持我的问题的好参数值范围是什么?

更新

我尝试了同义词过滤器

{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "synonyms_filter": {
            "type": "synonym",
            "synonyms": [
              "apple,APPL",
              "lemonade,LMND"
            ]
          }
        },
        "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "synonyms_filter"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "transcript_data": {
        "properties": {
          "words": {
            "type": "nested",
            "properties": {
              "word": {
                "type": "text",
                "search_analyzer":"synonym_analyzer"
              }
            }
          }
        }
      }
    }
  }
}

,对于搜索我使用过

{
  "_source": false,
  "query": {
    "nested": {
      "path": "transcript_data.words",
      "query": {
        "match": {
          "transcript_data.words.word": "lmnd"
        }
      }
    }
  }
}

,但它不起作用

I have indexed textual articles which mentions company names, like apple and lemonade, and am trying to search for these companies using their abbreviations like APPL and LMND but fuzzy search is giving other results, for example, searching with LMND gives land which is mentioned in the text but it doesn't output lemonade whichever parameters I tried.

First question
Is fuzzy search the suitable solution for such search ?

Second question

what could be a good parameter values ranges to support my problem ?

UPDATE

I have tried synonym filter

{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "synonyms_filter": {
            "type": "synonym",
            "synonyms": [
              "apple,APPL",
              "lemonade,LMND"
            ]
          }
        },
        "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "synonyms_filter"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "transcript_data": {
        "properties": {
          "words": {
            "type": "nested",
            "properties": {
              "word": {
                "type": "text",
                "search_analyzer":"synonym_analyzer"
              }
            }
          }
        }
      }
    }
  }
}

and for SEARCH I used

{
  "_source": false,
  "query": {
    "nested": {
      "path": "transcript_data.words",
      "query": {
        "match": {
          "transcript_data.words.word": "lmnd"
        }
      }
    }
  }
}

but it's not working

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

小矜持 2025-02-17 07:01:50

我相信,最适合您的选择是使用同义词,它们正是您所需要的。

我将留下一个示例,然后 link 在一篇文章中解释了一些细节。

PUT teste
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "synonyms_filter": {
            "type": "synonym",
            "synonyms": [
              "apple,APPL",
              "lemonade,LMND"
            ]
          }
        },
        "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "synonyms_filter"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "transcript_data": {
        "properties": {
          "words": {
            "type": "nested",
            "properties": {
              "word": {
                "type": "text",
                "analyzer":"synonym_analyzer"
              }
            }
          }
        }
      }
    }
  }
}

POST teste/_bulk
{"index":{}}
{"transcript_data": {"words":{"word":"apple"}}}


GET teste/_search
{
  "query": {
    "nested": {
      "path": "transcript_data.words",
      "query": {
        "match": {
          "transcript_data.words.word": "appl"
        }
      }
    }
  }
}

I believe that the best option for you is the use of synonyms, they serve exactly what you need.

I'll leave an example and the link to an article explaining some details.

PUT teste
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "synonyms_filter": {
            "type": "synonym",
            "synonyms": [
              "apple,APPL",
              "lemonade,LMND"
            ]
          }
        },
        "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "standard",
            "filter": [
              "lowercase",
              "synonyms_filter"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "transcript_data": {
        "properties": {
          "words": {
            "type": "nested",
            "properties": {
              "word": {
                "type": "text",
                "analyzer":"synonym_analyzer"
              }
            }
          }
        }
      }
    }
  }
}

POST teste/_bulk
{"index":{}}
{"transcript_data": {"words":{"word":"apple"}}}


GET teste/_search
{
  "query": {
    "nested": {
      "path": "transcript_data.words",
      "query": {
        "match": {
          "transcript_data.words.word": "appl"
        }
      }
    }
  }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文