Elasticsearch bool查询和关键字搜索

发布于 2025-01-22 12:15:25 字数 1644 浏览 0 评论 0原文

我们正在执行类似于以下查询的查询。该查询是查询一个索引(下面的映射),其中包含约340万条记录。我们查询的数据是包含长度不超过10,000个字符的加密单词的字符串。我们对正在搜索的单词进行加密,然后将其用作我们正在搜索的关键字。搜索需要非常长的时间(一分钟)才能返回结果。感谢我们对调整索引或查询的建议。

索引映射:

{
"messagewords": {
    "aliases": {},
    "mappings": {
        "properties": {
            "MessageId": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "Words": {
                "type": "text"
            }
        }
    },
    "settings": {
        "index": {
            "creation_date": "1649868562656",
            "number_of_shards": "5",
            "number_of_replicas": "1",
            "uuid": "YFVbbow0R66dP3uR4hF9aQ",
            "version": {
                "created": "7060299"
            },
            "provided_name": "messagewords"
        }
    }
}

}

查询:

    {
   "from":0,
   "_source":[
      "MessageId"
   ],
   "size":10000,
   "track_total_hits":true,
   "query":{
      "bool":{
         "must":[
            {
               "bool":{
                  "should":[
                     {
                        "query_string":{
                           "query":" ((Words:\"*nsrFHeMTTBOeIUvkMrYDoA==sr8O8Rpnxn0hOZ88Mbtu4g==pUniFgw3thZ8lXlj68jHqw==XKin211F6GVXm/QzvB+iLQ==HYzhyEJpcldxo3h8Sea+yA==SwmUP1KNAG4YqGdg/KlLdw==nsrFHeMTTBOeIUvkMrYDoA==*\"))"
                        }
                     }
                  ]
               }
            }
         ]
      }
   }
}

We are executing queries similar to the query below against our Elasticsearch instance. The query is querying an index (Mappings are below) that contains approx 3.4 million records. The data we are querying are strings containing encrypted words that are no more than 10,000 characters in length. We encrypt the words we are searching for and then use this as the keyword we are searching for. The search takes an incredibly long time (over a minute) to return results. Any help our suggestions on tuning our index or query is appreciated.

The index mapping:

{
"messagewords": {
    "aliases": {},
    "mappings": {
        "properties": {
            "MessageId": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "Words": {
                "type": "text"
            }
        }
    },
    "settings": {
        "index": {
            "creation_date": "1649868562656",
            "number_of_shards": "5",
            "number_of_replicas": "1",
            "uuid": "YFVbbow0R66dP3uR4hF9aQ",
            "version": {
                "created": "7060299"
            },
            "provided_name": "messagewords"
        }
    }
}

}

The query:

    {
   "from":0,
   "_source":[
      "MessageId"
   ],
   "size":10000,
   "track_total_hits":true,
   "query":{
      "bool":{
         "must":[
            {
               "bool":{
                  "should":[
                     {
                        "query_string":{
                           "query":" ((Words:\"*nsrFHeMTTBOeIUvkMrYDoA==sr8O8Rpnxn0hOZ88Mbtu4g==pUniFgw3thZ8lXlj68jHqw==XKin211F6GVXm/QzvB+iLQ==HYzhyEJpcldxo3h8Sea+yA==SwmUP1KNAG4YqGdg/KlLdw==nsrFHeMTTBOeIUvkMrYDoA==*\"))"
                        }
                     }
                  ]
               }
            }
         ]
      }
   }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

赴月观长安 2025-01-29 12:15:25

尝试使用Whitespace分析仪:

"Words": {"type": "text", "analyzer": "whitespace"}

与Match_phrase查询一起:

"match_phrase": {"words": "nsrFHeMTTBOeIUvkMrYDoA== ... SwmUP1KNAG4YqGdg/KlLdw== nsrFHeMTTBOeIUvkMrYDoA=="}

请注意,您必须将其拆分编码令牌供其正常工作。

Try whitespace analyzer:

"Words": {"type": "text", "analyzer": "whitespace"}

together with match_phrase query:

"match_phrase": {"words": "nsrFHeMTTBOeIUvkMrYDoA== ... SwmUP1KNAG4YqGdg/KlLdw== nsrFHeMTTBOeIUvkMrYDoA=="}

Please note that you'll have to split encoded tokens with spaces for it to work.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文