电子邮件未在Elasticsearch中正确搜索

发布于 2025-01-19 03:10:19 字数 3869 浏览 5 评论 0 原文

您好,我是 Elasticsearch 的新手,我遇到了电子邮件搜索无法正常工作的问题我正在使用 boto3 SDK,并且 AWS opensearch 服务已尝试此映射,

{
  "dev_auth0_logs_new_mapping": {
    "mappings": {
      "properties": {
        "activity_date": { "type": "date" },
        "activity_type": { "type": "text" },
        "client_id": { "type": "text" },
        "description": { "type": "text" },
        "event_data": { "type": "object", "enabled": false },
        "user_email": {
          "type": "text",
          "fields": { "keyword": { "type": "keyword" } }
        },
        "user_id": { "type": "text" }
      }
    }
  }
}

这是我的查询

{
  "from": 0,
  "size": "10",
  "track_total_hits": true,
  "_source": [
    "user_email",
    "user_id",
    "activity_date",
    "activity_type",
    "description",
    "client_id",
    "id"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "*[email protected]*",
            "default_field": "user_email",
            "default_operator": "OR"
          }
        }
      ]
    }
  },
  "sort": [{ "activity_date": "desc" }]
}

,它无法与完全匹配一起工作 例如:-ashutosh.pandya 它正在返回结果,但对于 [电子邮件受保护] 它没有返回结果 我也关注了这个博客 medium 博客 并创建了使用自定义电子邮件分析器的新映射对我来说不起作用我不知道我做错了什么

我尝试过此查询来获取来自 [email protected] 但没有得到任何点击,

{
    "from":0,
    "size":"10",
    "track_total_hits":True,
    "_source":[
       "user_email",
       "user_id",
       "activity_date",
       "activity_type",
       "description",
       "client_id",
       "id"
    ],
    "query":{
       "bool":{
          "must":[
             {
                "query_string":{
                   "query":"*[email protected]*",
                   "default_field":"user_email",
                   "default_operator":"OR"
                }
             }
          ]
       }
    },
    "sort":[
       {
          "activity_date":"desc"
       }
    ]
 }

但是当我搜索此查询时

{
    "from":0,
    "size":"10",
    "track_total_hits":True,
    "_source":[
       "user_email",
       "user_id",
       "activity_date",
       "activity_type",
       "description",
       "client_id",
       "id"
    ],
    "query":{
       "bool":{
          "must":[
             {
                "query_string":{
                   "query":"*ashutosh.pandya*",
                   "default_field":"user_email",
                   "default_operator":"OR"
                }
             }
          ]
       }
    },
    "sort":[
       {
          "activity_date":"desc"
       }
    ]
 }

,我得到了其中的所有点击user_email 包含 ashutosh.pandya 我想要这个:- 如果我搜索 ashutosh,我会得到用户 emali 包含 ashutosh 的所有点击 如果我搜索 ashu,我会得到用户电子邮件包含 ashu 的所有点击 如果我搜索 pandya,我会得到用户电子邮件包含 pandya 的所有点击 如果我搜索 [email protected] 我得到了用户电子邮件相同的所有点击到[电子邮件受保护] 如果我搜索域名,我会得到用户电子邮件包含域名的所有点击

hello I am new to elasticsearch I am having an issue with email search is not working properly I am using boto3 SDK and AWS opensearch service have tried this mapping

{
  "dev_auth0_logs_new_mapping": {
    "mappings": {
      "properties": {
        "activity_date": { "type": "date" },
        "activity_type": { "type": "text" },
        "client_id": { "type": "text" },
        "description": { "type": "text" },
        "event_data": { "type": "object", "enabled": false },
        "user_email": {
          "type": "text",
          "fields": { "keyword": { "type": "keyword" } }
        },
        "user_id": { "type": "text" }
      }
    }
  }
}

this is my query

{
  "from": 0,
  "size": "10",
  "track_total_hits": true,
  "_source": [
    "user_email",
    "user_id",
    "activity_date",
    "activity_type",
    "description",
    "client_id",
    "id"
  ],
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "*[email protected]*",
            "default_field": "user_email",
            "default_operator": "OR"
          }
        }
      ]
    }
  },
  "sort": [{ "activity_date": "desc" }]
}

it is not working with exact match
eg:-ashutosh.pandya it is returning results but for [email protected] it is not returning results
i have followed this blog also medium blog and created new mapping with custom email analyzer it did not work for me i dont know what i am doing wrong

I have tried this query to get all the logs from [email protected] but did not get any hits

{
    "from":0,
    "size":"10",
    "track_total_hits":True,
    "_source":[
       "user_email",
       "user_id",
       "activity_date",
       "activity_type",
       "description",
       "client_id",
       "id"
    ],
    "query":{
       "bool":{
          "must":[
             {
                "query_string":{
                   "query":"*[email protected]*",
                   "default_field":"user_email",
                   "default_operator":"OR"
                }
             }
          ]
       }
    },
    "sort":[
       {
          "activity_date":"desc"
       }
    ]
 }

but when i search this query

{
    "from":0,
    "size":"10",
    "track_total_hits":True,
    "_source":[
       "user_email",
       "user_id",
       "activity_date",
       "activity_type",
       "description",
       "client_id",
       "id"
    ],
    "query":{
       "bool":{
          "must":[
             {
                "query_string":{
                   "query":"*ashutosh.pandya*",
                   "default_field":"user_email",
                   "default_operator":"OR"
                }
             }
          ]
       }
    },
    "sort":[
       {
          "activity_date":"desc"
       }
    ]
 }

i got all the hits in which user_email contains ashutosh.pandya
I want this :-
if I search ashutosh i got all the hits where user emali contain ashutosh
if I search ashu i got all the hits where user email contain ashu
if I search for pandya i got all the hits where user email contains pandya
if I search [email protected] i got all the hits where user email equal to [email protected]
if i search for domain i got all the hits where user email contains domain

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

Spring初心 2025-01-26 03:10:19

您不需要用于通配符匹配的自定义分析器。您实际上根本不需要将电子邮件拆分为令牌,因此在映射中使用 keyword 类型作为 email 或使用 email.keyword搜索时。

You don't need a custom analyzer for wildcard matches. You don't really need your email to be split into tokens at all so use keyword type for email in the mapping or use email.keyword when searching.

逆流 2025-01-26 03:10:19

我通过创建模式捕获令牌过滤器解决了这个问题
这是文档链接

i have solved this issue by creating a pattern capture token filter
this is the document link elasticsearch pattern capture token filter

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文