如果错误在弹性搜索中写了关键字,如何匹配相关数据

发布于 2025-01-21 04:02:23 字数 2375 浏览 2 评论 0原文

我有一个文档包含“努力工作”的标题。我需要搜索此文档。而且,如果我键入“努力工作”(没有间距),则不会返回任何值。但是,如果我输入“努力工作”,那么它将返回文档。

这是我使用的查询:

const search = qObject.search;
const payload = {
  from: skip,
  size: limit,
  _source: [
    "id",
    "title",
    "thumbnailUrl",
    "youtubeUrl",
    "speaker",
    "standards",
    "topics",
    "schoolDetails",
    "uploadTime",
    "schoolName",
    "description",
    "studentDetails",
    "studentId"
  ],
  query: {
    bool: {
      must: {
        multi_match: {
          fields: [
            "title^2",
            "standards.standard^2",
            "speaker^2",
            "schoolDetails.schoolName^2",
            "hashtags^2",
            "topics.topic^2",
            "studentDetails.studentName^2",
          ],
          query: search,
          fuzziness: "AUTO",
        },
      },
    },
  },
};

如果我搜索标题“努力工作”(随附的空间) 然后,它返回这样的数据:

"searchResults": [
        {
            "_id": "92",
            "_score": 19.04531,
            "_source": {
                "standards": {
                    "standard": "3",
                    "categoryType": "STANDARD",
                    "categoryId": "S3"
                },
                "schoolDetails": {
                    "categoryType": "SCHOOL",
                    "schoolId": "TPS123",
                    "schoolType": "PUBLIC",
                    "logo": "91748922mn8bo9krcx71.png",
                    "schoolName": "Carmel CMI Public School"
                },
                "studentDetails": {
                    "studentId": 270,
                    "studentDp": "164646972124244.jpg",
                    "studentName": "Nelvin",
                    "about": "good student"
                },
                "topics": {
                    "categoryType": "TOPIC",
                    "topic": "Motivation",
                    "categoryId": "MY"
                },
                "youtubeUrl": "https://www.youtube.com/watch?v=wermQ",
                "speaker": "Anna Maria Siby",
                "description": "How hardwork leads to success - motivational talk by Anna",
                "id": 92,
                "uploadTime": "2022-03-17T10:59:59.400Z",
                "title": "Hard work & Success",
            }
        },
]

如果我搜索关键字“努力”(无间距),它将无法检测到此数据。我需要在其中腾出空间,或者需要将相关数据与搜索关键字匹配。有什么解决方案可以为您提供帮助。

I have a document contain title with "Hard work & Success". I need to do a search for this document. And if I typed "Hardwork" (without spacing) it didn't returning any value. but if I typed "hard work" then it is returning the document.

this is the query I have used :

const search = qObject.search;
const payload = {
  from: skip,
  size: limit,
  _source: [
    "id",
    "title",
    "thumbnailUrl",
    "youtubeUrl",
    "speaker",
    "standards",
    "topics",
    "schoolDetails",
    "uploadTime",
    "schoolName",
    "description",
    "studentDetails",
    "studentId"
  ],
  query: {
    bool: {
      must: {
        multi_match: {
          fields: [
            "title^2",
            "standards.standard^2",
            "speaker^2",
            "schoolDetails.schoolName^2",
            "hashtags^2",
            "topics.topic^2",
            "studentDetails.studentName^2",
          ],
          query: search,
          fuzziness: "AUTO",
        },
      },
    },
  },
};

if I searched for title "hard work" (included space)
then it returns data like this:

"searchResults": [
        {
            "_id": "92",
            "_score": 19.04531,
            "_source": {
                "standards": {
                    "standard": "3",
                    "categoryType": "STANDARD",
                    "categoryId": "S3"
                },
                "schoolDetails": {
                    "categoryType": "SCHOOL",
                    "schoolId": "TPS123",
                    "schoolType": "PUBLIC",
                    "logo": "91748922mn8bo9krcx71.png",
                    "schoolName": "Carmel CMI Public School"
                },
                "studentDetails": {
                    "studentId": 270,
                    "studentDp": "164646972124244.jpg",
                    "studentName": "Nelvin",
                    "about": "good student"
                },
                "topics": {
                    "categoryType": "TOPIC",
                    "topic": "Motivation",
                    "categoryId": "MY"
                },
                "youtubeUrl": "https://www.youtube.com/watch?v=wermQ",
                "speaker": "Anna Maria Siby",
                "description": "How hardwork leads to success - motivational talk by Anna",
                "id": 92,
                "uploadTime": "2022-03-17T10:59:59.400Z",
                "title": "Hard work & Success",
            }
        },
]

And if i search for the Keyword "Hardwork" (without spacing) it won't detecting this data. I need to make a space in it or I need to match related datas with the searching keyword. Is there any solution for this can you please help me out of this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

记忆之渊 2025-01-28 04:02:24

我以木瓦分析仪为例。

映射:

    {
  "settings": {
    "analysis": {
      "filter": {
        "shingle_filter": {
          "type": "shingle",
          "max_shingle_size": 4,
          "min_shingle_size": 2,
          "output_unigrams": "true",
          "token_separator": ""
        }
      },
      "analyzer": {
        "shingle_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "shingle_filter"
          ]
        }
      }
    }
  },
  "mappings": {
        "properties": {
      "title": {
        "type": "text",
        "analyzer": "shingle_analyzer"
      }
    }
  }
}

现在我用您的任期对其进行了测试。请注意,产生了令牌“努力”,但也生成了其他令牌,这可能是您的问题。

GET idx-separator-words/_analyze
{
  "analyzer": "shingle_analyzer",
  "text": ["Hard work & Success"]
}

结果:

{
  "tokens" : [
    {
      "token" : "hard",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "hardwork",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "shingle",
      "position" : 0,
      "positionLength" : 2
    },
    {
      "token" : "hardworksuccess",
      "start_offset" : 0,
      "end_offset" : 19,
      "type" : "shingle",
      "position" : 0,
      "positionLength" : 3
    },
    {
      "token" : "work",
      "start_offset" : 5,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "worksuccess",
      "start_offset" : 5,
      "end_offset" : 19,
      "type" : "shingle",
      "position" : 1,
      "positionLength" : 2
    },
    {
      "token" : "success",
      "start_offset" : 12,
      "end_offset" : 19,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}

I made an example using a shingle analyzer.

Mapping:

    {
  "settings": {
    "analysis": {
      "filter": {
        "shingle_filter": {
          "type": "shingle",
          "max_shingle_size": 4,
          "min_shingle_size": 2,
          "output_unigrams": "true",
          "token_separator": ""
        }
      },
      "analyzer": {
        "shingle_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "shingle_filter"
          ]
        }
      }
    }
  },
  "mappings": {
        "properties": {
      "title": {
        "type": "text",
        "analyzer": "shingle_analyzer"
      }
    }
  }
}

Now I tested it with your term. Note that the token "hardwork" was generated but the others were also generated which may be a problem for you.

GET idx-separator-words/_analyze
{
  "analyzer": "shingle_analyzer",
  "text": ["Hard work & Success"]
}

Results:

{
  "tokens" : [
    {
      "token" : "hard",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "hardwork",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "shingle",
      "position" : 0,
      "positionLength" : 2
    },
    {
      "token" : "hardworksuccess",
      "start_offset" : 0,
      "end_offset" : 19,
      "type" : "shingle",
      "position" : 0,
      "positionLength" : 3
    },
    {
      "token" : "work",
      "start_offset" : 5,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "worksuccess",
      "start_offset" : 5,
      "end_offset" : 19,
      "type" : "shingle",
      "position" : 1,
      "positionLength" : 2
    },
    {
      "token" : "success",
      "start_offset" : 12,
      "end_offset" : 19,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文