将mySQL聚合查询转换为alasticsearch

发布于 2025-02-11 16:37:21 字数 936 浏览 2 评论 0原文

我有一个评论表,在过去的一年中,我将其移至Elasticsearch。

问题在于,我需要调整我当前在MySQL中的查询,该查询可以返回过去7天内每天的评论总数。

这是我现在拥有的MySQL查询:

SELECT count(*) AS number, DATE(created_at) AS date 
FROM `comments` 
WHERE `post_id` = ? 
GROUP BY `date` 
ORDER BY `date` DESC 
LIMIT 7

我的索引看起来像:

{
  "mappings": {
    "_doc": {
      "properties": {
        "id": {
          "type": "keyword"
        },
        "post_id": {
          "type": "integer"
        },
        "subject": {
          "analyzer": "custom_html_strip",
          "type": "text"
        },
        "body": {
          "analyzer": "custom_html_strip",
          "type": "text"
        },
        "created_at": {
          "format": "yyyy-MM-dd HH:mm:ss",
          "type": "date"
        }
      }
    }
  }
}

是否可以重现Elasticsearch的查询?如果是这样,那会怎样?

我的Elasticsearch知识有限,我知道它提供了聚合,但我真的不知道如何将所有内容放在一起。

I have a comments table that over the past year has grown considerably and I'm moving it to ElasticSearch.

The problem is that I need to adapt a query that I currently have in MySQL which returns the total number of comments for each day in the last 7 days for a given post.

Here's the MySQL query that I have now:

SELECT count(*) AS number, DATE(created_at) AS date 
FROM `comments` 
WHERE `post_id` = ? 
GROUP BY `date` 
ORDER BY `date` DESC 
LIMIT 7

My index looks like this:

{
  "mappings": {
    "_doc": {
      "properties": {
        "id": {
          "type": "keyword"
        },
        "post_id": {
          "type": "integer"
        },
        "subject": {
          "analyzer": "custom_html_strip",
          "type": "text"
        },
        "body": {
          "analyzer": "custom_html_strip",
          "type": "text"
        },
        "created_at": {
          "format": "yyyy-MM-dd HH:mm:ss",
          "type": "date"
        }
      }
    }
  }
}

Is it possible to reproduce that query for ElasticSearch? If so, how would that look like?

My ElasticSearch knowledge is kinda limited, I know that it offers aggregation, but I don't really know how to put it all together.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

香橙ぽ 2025-02-18 16:37:22

Elasticsearch也支持SQL查询(尽管有限)
因此,有了一点更改,您可以使用此类内容

GET _sql
{
  "query": """
  SELECT count(*) AS number, created_at AS date 
FROM comments
WHERE post_id = 123 
GROUP BY date
ORDER BY date DESC 
LIMIT 7
  """
}

,可以使用_SQL/Translate看到相应的查询,这将返回

{
  "size" : 0,
  "query" : {
    "term" : {
      "post_id" : {
        "value" : 123,
        "boost" : 1.0
      }
    }
  },
  "_source" : false,
  "stored_fields" : "_none_",
  "aggregations" : {
    "groupby" : {
      "composite" : {
        "size" : 7,
        "sources" : [
          {
            "31239" : {
              "terms" : {
                "field" : "created_at",
                "missing_bucket" : true,
                "order" : "desc"
              }
            }
          }
        ]
      }
    }
  }
}

说,不需要翻译查询中使用的某些内容,因此这将是一个更好的本地查询

{
  "query": {
    "term": {
      "post_id": {
        "value": 123
      }
    }
  },
  "aggs": {
    "unq_dates": {
      "terms": {
        "field": "created_at",
        "size": 7,
        "order": {
          "_term": "desc"
        }
      }
    }
  }
}

Elasticsearch supports sql queries as well (though they are limited)
so with a little change you can use something like this

GET _sql
{
  "query": """
  SELECT count(*) AS number, created_at AS date 
FROM comments
WHERE post_id = 123 
GROUP BY date
ORDER BY date DESC 
LIMIT 7
  """
}

You can see the corresponding query using _sql/translate, which will return

{
  "size" : 0,
  "query" : {
    "term" : {
      "post_id" : {
        "value" : 123,
        "boost" : 1.0
      }
    }
  },
  "_source" : false,
  "stored_fields" : "_none_",
  "aggregations" : {
    "groupby" : {
      "composite" : {
        "size" : 7,
        "sources" : [
          {
            "31239" : {
              "terms" : {
                "field" : "created_at",
                "missing_bucket" : true,
                "order" : "desc"
              }
            }
          }
        ]
      }
    }
  }
}

That being said, some of the stuff used in the translated query is not needed, so this will be a better native query

{
  "query": {
    "term": {
      "post_id": {
        "value": 123
      }
    }
  },
  "aggs": {
    "unq_dates": {
      "terms": {
        "field": "created_at",
        "size": 7,
        "order": {
          "_term": "desc"
        }
      }
    }
  }
}
梦幻的心爱 2025-02-18 16:37:22

首先,建议定期汇总您的时间字段,因此使用直接写入“ Yyyy-MM-DD”格式的字段将提高您的性能并减少资源的消费

{
    "query":
    {
        "filter":
        [
            {
                "term":
                {
                    "post_id":
                    {
                        "value": 1
                    }
                }
            }
        ]
    },
    "_source": false,
    "aggs":
    {
        "created_at":
        {
            "auto_date_histogram":
            {
                "field": "created_at",
                "buckets": 7,
                "format": "yyyy-MM-dd"
            }
        }
    }
}

First, it is recommended that your time fields be regularly aggregated, so using fields written directly to a 'YYYY-MM-DD' format will improve your performance and reduce your consumption of resources

{
    "query":
    {
        "filter":
        [
            {
                "term":
                {
                    "post_id":
                    {
                        "value": 1
                    }
                }
            }
        ]
    },
    "_source": false,
    "aggs":
    {
        "created_at":
        {
            "auto_date_histogram":
            {
                "field": "created_at",
                "buckets": 7,
                "format": "yyyy-MM-dd"
            }
        }
    }
}
春花秋月 2025-02-18 16:37:21

在过去的7天中,使用以下查询在给定的“ post_id”上获取所有评论。

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "id": {
              "value": "the_post_id"
            }
          }
        },
        /*** only include this clause if you want the recent most 7 days ***/
        {
          "range": {
            "created_at": {
              "gte": "now-7d/d",
              "lt": "now/d"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "posts_per_day": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "day",
        "order" : {"_key" : "desc"}
      }
    }
  }
}

从聚合中,在客户端应用程序中获取前7个存储桶。

Use the following query to get all the comments on a given "post_id" for the last 7 days.

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "id": {
              "value": "the_post_id"
            }
          }
        },
        /*** only include this clause if you want the recent most 7 days ***/
        {
          "range": {
            "created_at": {
              "gte": "now-7d/d",
              "lt": "now/d"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "posts_per_day": {
      "date_histogram": {
        "field": "created_at",
        "calendar_interval": "day",
        "order" : {"_key" : "desc"}
      }
    }
  }
}

From the aggregation, pick up the first 7 buckets in your client application.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文