mongodb 中按日期分组

发布于 2024-10-20 08:25:42 字数 841 浏览 4 评论 0 原文

我正在开发一个项目,在该项目中我正在跟踪某个主题的点击次数。

我正在使用 mongodb,我必须按日期对点击次数进行分组(我想对 15 天的数据进行分组)。

我在 mongodb 中有以下格式的数据存储,

{ 
   "_id" : ObjectId("4d663451d1e7242c4b68e000"), 
  "date" : "Mon Dec 27 2010 18:51:22 GMT+0000 (UTC)", 
  "topic" : "abc", 
  "time" : "18:51:22"
}
{ 
    "_id" : ObjectId("4d6634514cb5cb2c4b69e000"), 
    "date" : "Mon Dec 27 2010 18:51:23 GMT+0000 (UTC)", 
    "topic" : "bce", 
    "time" : "18:51:23"
}

我想对主题的点击次数进行分组:abc 按天(15 天)..我知道如何对其进行分组,但如何按存储在我的数据库中的日期进行分组

我我正在寻找以下格式的结果

[
  {
    "date" : "date in log",
    "click" : 9 
  },  
  {
    "date" : "date in log",
    "click" : 19
  },  
]

我已经编写了代码,但仅当日期为字符串时它才有效(代码位于此处 http:// astebin.com/2wm1n1ix) ...请指导我如何对其进行分组

I am working on a project in which I am tracking number of clicks on a topic.

I am using mongodb and I have to group number of click by date( i want to group data for 15 days).

I am having data store in following format in mongodb

{ 
   "_id" : ObjectId("4d663451d1e7242c4b68e000"), 
  "date" : "Mon Dec 27 2010 18:51:22 GMT+0000 (UTC)", 
  "topic" : "abc", 
  "time" : "18:51:22"
}
{ 
    "_id" : ObjectId("4d6634514cb5cb2c4b69e000"), 
    "date" : "Mon Dec 27 2010 18:51:23 GMT+0000 (UTC)", 
    "topic" : "bce", 
    "time" : "18:51:23"
}

i want to group number of clicks on topic:abc by days(for 15 days)..i know how to group that but how can I group by date which are stored in my database

I am looking for result in following format

[
  {
    "date" : "date in log",
    "click" : 9 
  },  
  {
    "date" : "date in log",
    "click" : 19
  },  
]

I have written code but it will work only if date are in string (code is here http://pastebin.com/2wm1n1ix)
...please guide me how do I group it

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

过期以后 2024-10-27 08:25:43

这个问题已经有很多答案,但我对其中任何一个都不满意。 MongoDB 多年来已经得到了改进,现在有更简单的方法可以做到这一点。 Jonas Tomanga 的答案是正确的,但有点太复杂了。

如果您使用的是 MongoDB 3.0 或更高版本,则可以按以下方式按日期分组。我从 $match 聚合开始,因为作者还问了如何限制结果。

db.yourCollection.aggregate([
  { $match: { date: { $gte: ISODate("2019-05-01") } } },
  { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$date"} }, count: { $sum: 1 } } },
  { $sort: { _id: 1} }
])

There are already many answers to this question, but I wasn't happy with any of them. MongoDB has improved over the years, and there are now easier ways to do it. The answer by Jonas Tomanga gets it right, but is a bit too complex.

If you are using MongoDB 3.0 or later, here's how you can group by date. I start with the $match aggregation because the author also asked how to limit the results.

db.yourCollection.aggregate([
  { $match: { date: { $gte: ISODate("2019-05-01") } } },
  { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$date"} }, count: { $sum: 1 } } },
  { $sort: { _id: 1} }
])
似梦非梦 2024-10-27 08:25:43

在mongodb中按日期获取数据

db.getCollection('supportIssuesChat').aggregate([
{
        $group : {
           _id :{ $dateToString: { format: "%Y-%m-%d", date: "$createdAt"} },
           list: { $push: "$ROOT" },
           count: { $sum: 1 }
        }
}
])

To fetch data group by date in mongodb

db.getCollection('supportIssuesChat').aggregate([
{
        $group : {
           _id :{ $dateToString: { format: "%Y-%m-%d", date: "$createdAt"} },
           list: { $push: "$ROOT" },
           count: { $sum: 1 }
        }
}
])
独自唱情﹋歌 2024-10-27 08:25:43

迟到的答案,但为了记录(对于来到此页面的其他人):您需要使用“keyf”参数而不是“key”,因为您的密钥实际上将是日期的函数事件(即从日期中提取的“日”)而不是日期本身。这应该可以满足您的需求:

db.coll.group(
{
    keyf: function(doc) {
        var date = new Date(doc.date);
        var dateKey = (date.getMonth()+1)+"/"+date.getDate()+"/"+date.getFullYear()+'';
        return {'day':dateKey};
    },
    cond: {topic:"abc"},
    initial: {count:0},
    reduce: function(obj, prev) {prev.count++;}
});

有关更多信息,请查看有关聚合和组的 MongoDB 文档页面:http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Group

Late answer, but for the record (for anyone else that comes to this page): You'll need to use the 'keyf' argument instead of 'key', since your key is actually going to be a function of the date on the event (i.e. the "day" extracted from the date) and not the date itself. This should do what you're looking for:

db.coll.group(
{
    keyf: function(doc) {
        var date = new Date(doc.date);
        var dateKey = (date.getMonth()+1)+"/"+date.getDate()+"/"+date.getFullYear()+'';
        return {'day':dateKey};
    },
    cond: {topic:"abc"},
    initial: {count:0},
    reduce: function(obj, prev) {prev.count++;}
});

For more information, take a look at MongoDB's doc page on aggregation and group: http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Group

养猫人 2024-10-27 08:25:43

这可以帮助

return new Promise(function(resolve, reject) {
db.doc.aggregate(
            [
                { $match: {} },
                { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$date" } }, count: { $sum: 1 } } },
                { $sort: { _id: 1 } }
            ]
        ).then(doc => {
            /* if you need a date object */
            doc.forEach(function(value, index) {
                  doc[index]._id = new Date(value._id);
              }, this);
            resolve(doc);
        }).catch(reject);
}

This can help

return new Promise(function(resolve, reject) {
db.doc.aggregate(
            [
                { $match: {} },
                { $group: { _id: { $dateToString: { format: "%Y-%m-%d", date: "$date" } }, count: { $sum: 1 } } },
                { $sort: { _id: 1 } }
            ]
        ).then(doc => {
            /* if you need a date object */
            doc.forEach(function(value, index) {
                  doc[index]._id = new Date(value._id);
              }, this);
            resolve(doc);
        }).catch(reject);
}
旧梦荧光笔 2024-10-27 08:25:43

还没有使用 MongoDB 那么多,所以我不太确定。但是你不能使用完整的 Javascript 吗?
因此,您可以使用 Javascript Date 类解析您的日期,创建当天的日期,并将其设置为“out”属性的键。如果该键已存在,则始终添加一个,否则使用 value = 1 创建新的(第一次单击)。下面是带有改编后的reduce函数的代码(未经测试的代码!):

db.coll.group(
{
   key:{'date':true},
   initial: {retVal: {}},
   reduce: function(doc, prev){
              var date = new Date(doc.date);
              var dateKey = date.getFullYear()+''+date.getMonth()+''+date.getDate();
              (typeof prev.retVal[dateKey] != 'undefined') ? prev.retVal[dateKey] += 1 : prev.retVal[dateKey] = 1;
            }, 
   cond: {topic:"abc"}
}
)

Haven't worked that much with MongoDB yet, so I am not completely sure. But aren't you able to use full Javascript?
So you could parse your date with Javascript Date class, create your date for the day out of it and set as key into an "out" property. And always add one if the key already exists, otherwise create it new with value = 1 (first click). Below is your code with adapted reduce function (untested code!):

db.coll.group(
{
   key:{'date':true},
   initial: {retVal: {}},
   reduce: function(doc, prev){
              var date = new Date(doc.date);
              var dateKey = date.getFullYear()+''+date.getMonth()+''+date.getDate();
              (typeof prev.retVal[dateKey] != 'undefined') ? prev.retVal[dateKey] += 1 : prev.retVal[dateKey] = 1;
            }, 
   cond: {topic:"abc"}
}
)
雅心素梦 2024-10-27 08:25:43

感谢@mindthief,你的回答帮助解决了我今天的问题。下面的功能可以更容易地按天分组,希望可以帮助其他人。

/**
 * group by day
 * @param query document {key1:123,key2:456}
 */
var count_by_day = function(query){
    return db.action.group(
    {
        keyf: function(doc) {
            var date = new Date(doc.time);
            var dateKey = (date.getMonth()+1)+"/"+date.getDate()+"/"+date.getFullYear();
            return {'date': dateKey};
        },
        cond:query,
        initial: {count:0},
        reduce: function(obj, prev) {
          prev.count++;
        }
    });
}

count_by_day({this:'is',the:'query'})

thanks for @mindthief, your answer help solve my problem today. The function below can group by day a little more easier, hope can help the others.

/**
 * group by day
 * @param query document {key1:123,key2:456}
 */
var count_by_day = function(query){
    return db.action.group(
    {
        keyf: function(doc) {
            var date = new Date(doc.time);
            var dateKey = (date.getMonth()+1)+"/"+date.getDate()+"/"+date.getFullYear();
            return {'date': dateKey};
        },
        cond:query,
        initial: {count:0},
        reduce: function(obj, prev) {
          prev.count++;
        }
    });
}

count_by_day({this:'is',the:'query'})
给不了的爱 2024-10-27 08:25:43

另一个迟到的答案,但仍然如此。因此,如果您只想在一次迭代中完成此操作并获取按日期和主题分组的点击次数,您可以使用以下代码:

db.coll.group(
{
   $keyf : function(doc) {
       return { "date" : doc.date.getDate()+"/"+doc.date.getMonth()+"/"+doc.date.getFullYear(),
                "topic": doc.topic };
    },
    initial: {count:0},
    reduce: function(obj, prev) { prev.count++; }
 })

另外,如果您想按照建议优化查询,您可以使用日期的整数值(提示:使用 valueOf() 作为关键日期而不是字符串,尽管对于我的示例来说速度是相同的,

而且定期检查 MongoDB 文档总是明智的,因为它们一直在添加新功能,例如新功能。聚合框架,将在2.2版本中发布,您可以更轻松地实现相同的结果 http:// /docs.mongodb.org/manual/applications/aggregation/

Another late answer, but still. So if you wanna do it in only one iteration and get the number of clicks grouped by date and topic you can use the following code:

db.coll.group(
{
   $keyf : function(doc) {
       return { "date" : doc.date.getDate()+"/"+doc.date.getMonth()+"/"+doc.date.getFullYear(),
                "topic": doc.topic };
    },
    initial: {count:0},
    reduce: function(obj, prev) { prev.count++; }
 })

Also If you would like to optimize the query as suggested you can use an integer value for date (hint: use valueOf(), for the key date instead of the String, though for my examples the speed was the same.

Furthermore it's always wise to check the MongoDB docs regularly, because they keep adding new features all the time. For example with the new Aggregation framework, which will be released in the 2.2 version you can achieve the same results much easier http://docs.mongodb.org/manual/applications/aggregation/

记忆之渊 2024-10-27 08:25:43

如果您想要直接返回 Date oject

然后而不是应用 < a href="https://docs.mongodb.org/manual/reference/operator/aggregation-date/" rel="nofollow">日期聚合运算符,而是应用“日期数学”对日期对象进行舍入。这通常是可取的,因为所有驱动程序都以通常用于所有可能的语言的日期操作的形式表示 BSON 日期:

db.datetest.aggregate([
    { "$group": {
        "_id": {
            "$add": [
                { "$subtract": [
                    { "$subtract": [ "$date", new Date(0) ] },
                    { "$mod": [
                        { "$subtract": [ "$date", new Date(0) ] },
                        1000 * 60 * 60 * 24
                    ]}
                ]},
                new Date(0)
            ]
        },
        "click": { "$sum": 1 }
    }}
])

或者如果问题中暗示所需的分组间隔是 15 天的“桶” ,然后只需将其应用于 $mod 中的数值:

db.datetest.aggregate([
    { "$group": {
        "_id": {
            "$add": [
                { "$subtract": [
                    { "$subtract": [ "$date", new Date(0) ] },
                    { "$mod": [
                        { "$subtract": [ "$date", new Date(0) ] },
                        1000 * 60 * 60 * 24 * 15
                    ]}
                ]},
                new Date(0)
            ]
        },
        "click": { "$sum": 1 }
    }}
])

应用的基本数学是当您 $subtract 两个 Date 对象返回的结果将是数值上相差的毫秒数。因此纪元由 Date(0) 表示,作为您拥有的任何语言构造函数中转换的基础。

对于数值,“模”( $mod )用于将日期四舍五入(从除法中减去余数)到所需的间隔。是:

1000 毫秒 x 60 秒 * 60 分钟 * 24 小时 = 1 天

1000 毫秒 x 60 秒 * 60 分钟 * 24 小时 * 15 天 = 15 天

因此,它可以灵活地满足您需要的任何间隔。

同样的道理,上面的 $addDate 对象之间的操作将返回一个 Date 对象,该对象等于两个对象组合的毫秒值(纪元为 0,因此 0 加上差值)是转换后的日期)。

在以下列表中可以轻松表示和重现:

var now = new Date();
var bulk = db.datetest.initializeOrderedBulkOp();

for ( var x = 0; x < 60; x++ ) {
    bulk.insert({ "date": new Date( now.valueOf() + ( 1000 * 60 * 60 * 24 * x ))});
}

bulk.execute();

并以 15 天的间隔运行第二个示例:

{ "_id" : ISODate("2016-04-14T00:00:00Z"), "click" : 12 }
{ "_id" : ISODate("2016-03-30T00:00:00Z"), "click" : 15 }
{ "_id" : ISODate("2016-03-15T00:00:00Z"), "click" : 15 }
{ "_id" : ISODate("2016-02-29T00:00:00Z"), "click" : 15 }
{ "_id" : ISODate("2016-02-14T00:00:00Z"), "click" : 3 }

或类似的分布,具体取决于列表运行时的当前日期,当然,自纪元日期以来,15 天的间隔将保持一致。

使用“数学”方法更容易调整,特别是如果您想在聚合输出中调整不同时区的时间段,您可以通过添加/减去与 UTC 的数字差异来进行类似的数字调整。

If You want a Date oject returned directly

Then instead of applying the Date Aggregation Operators, instead apply "Date Math" to round the date object. This can often be desirable as all drivers represent a BSON Date in a form that is commonly used for Date manipulation for all languages where that is possible:

db.datetest.aggregate([
    { "$group": {
        "_id": {
            "$add": [
                { "$subtract": [
                    { "$subtract": [ "$date", new Date(0) ] },
                    { "$mod": [
                        { "$subtract": [ "$date", new Date(0) ] },
                        1000 * 60 * 60 * 24
                    ]}
                ]},
                new Date(0)
            ]
        },
        "click": { "$sum": 1 }
    }}
])

Or if as is implied in the question that the grouping interval required is "buckets" of 15 days, then simply apply that to the numeric value in $mod:

db.datetest.aggregate([
    { "$group": {
        "_id": {
            "$add": [
                { "$subtract": [
                    { "$subtract": [ "$date", new Date(0) ] },
                    { "$mod": [
                        { "$subtract": [ "$date", new Date(0) ] },
                        1000 * 60 * 60 * 24 * 15
                    ]}
                ]},
                new Date(0)
            ]
        },
        "click": { "$sum": 1 }
    }}
])

The basic math applied is that when you $subtract two Date objects the result returned will be the milliseconds of differnce numerically. So epoch is represented by Date(0) as the base for conversion in whatever language constructor you have.

With a numeric value, the "modulo" ( $mod ) is applied to round the date ( subtract the remainder from the division ) to the required interval. Being either:

1000 milliseconds x 60 seconds * 60 minutes * 24 hours = 1 day

Or

1000 milliseconds x 60 seconds * 60 minutes * 24 hours * 15 days = 15 days

So it's flexible to whatever interval you require.

By the same token from above an $add operation between a "numeric" value and a Date object will return a Date object equivalent to the millseconds value of both objects combined ( epoch is 0, therefore 0 plus difference is the converted date ).

Easily represented and reproducible in the following listing:

var now = new Date();
var bulk = db.datetest.initializeOrderedBulkOp();

for ( var x = 0; x < 60; x++ ) {
    bulk.insert({ "date": new Date( now.valueOf() + ( 1000 * 60 * 60 * 24 * x ))});
}

bulk.execute();

And running the second example with 15 day intervals:

{ "_id" : ISODate("2016-04-14T00:00:00Z"), "click" : 12 }
{ "_id" : ISODate("2016-03-30T00:00:00Z"), "click" : 15 }
{ "_id" : ISODate("2016-03-15T00:00:00Z"), "click" : 15 }
{ "_id" : ISODate("2016-02-29T00:00:00Z"), "click" : 15 }
{ "_id" : ISODate("2016-02-14T00:00:00Z"), "click" : 3 }

Or similar distribution depending on the current date when the listing is run, and of course the 15 day intervals will be consistent since the epoch date.

Using the "Math" method is a bit easier to tune, especially if you want to adjust time periods for different timezones in aggregation output where you can similarly numerically adjust by adding/subtracting the numeric difference from UTC.

緦唸λ蓇 2024-10-27 08:25:43

当然,这是一个很好的解决方案。除此之外,您可以按天将日期分组为字符串(如该答案建议的那样),或者您可以通过以下方式获取日期的开始时间像这样投影日期字段(聚合):

{'$project': {
    'start_of_day': {'$subtract': [
        '$date',
        {'$add': [
            {'$multiply': [{'$hour': '$date'}, 3600000]},
            {'$multiply': [{'$minute': '$date'}, 60000]},
            {'$multiply': [{'$second': '$date'}, 1000]},
            {'$millisecond': '$date'}
        ]}
    ]},
}}

它给你这个:

{
    "start_of_day" : ISODate("2015-12-03T00:00:00.000Z")
},
{
    "start_of_day" : ISODate("2015-12-04T00:00:00.000Z")
}

它有一些优点:你可以在日期类型(不是数字或字符串)中操作你的日期,它允许你使用所有的 日期聚合运算符 在以下聚合操作中,并为您提供输出的日期类型。

Of course, that is a good solution. Aside from that you can group dates by days as strings (as that answer propose) or you can get the beginning of dates by projecting date field (in aggregation) like that:

{'$project': {
    'start_of_day': {'$subtract': [
        '$date',
        {'$add': [
            {'$multiply': [{'$hour': '$date'}, 3600000]},
            {'$multiply': [{'$minute': '$date'}, 60000]},
            {'$multiply': [{'$second': '$date'}, 1000]},
            {'$millisecond': '$date'}
        ]}
    ]},
}}

It gives you this:

{
    "start_of_day" : ISODate("2015-12-03T00:00:00.000Z")
},
{
    "start_of_day" : ISODate("2015-12-04T00:00:00.000Z")
}

It has some pluses: you can manipulate with your days in date type (not number or string), it allows you to use all of the date aggregation operators in following aggregation operations and gives you date type on the output.

深府石板幽径 2024-10-27 08:25:42

使用 Mongo 聚合框架的新答案

在提出并回答这个问题后,10gen 发布了带有聚合框架的 Mongodb 版本 2.2,这是现在执行此类查询的更好方法。这个查询有点具有挑战性,因为您想要按日期分组并且存储的值是时间戳,因此您必须执行一些操作将时间戳转换为匹配的日期。为了示例的目的,我将只编写一个获得正确计数的查询。

db.col.aggregate(
   { $group: { _id: { $dayOfYear: "$date"},
               click: { $sum: 1 } } }
   )

这将返回类似以下内容:

[
    {
        "_id" : 144,
        "click" : 165
    },
    {
        "_id" : 275,
        "click" : 12
    }
]

您需要使用 $match 将查询限制为您感兴趣的日期范围,并使用 $project 重命名 _id日期。如何将一年中的某一天转换回日期留给读者作为练习。 :-)

10gen 有一个方便的SQL 到 Mongo Aggregation 转换图表值得参考书签。还有一篇关于日期聚合运算符的具体文章。

更有趣的是,您可以使用:

db.col.aggregate([
  { $group: {
      _id: {
        $add: [
         { $dayOfYear: "$date"}, 
         { $multiply: 
           [400, {$year: "$date"}]
         }
      ]},   
      click: { $sum: 1 },
      first: {$min: "$date"}
    }
  },
  { $sort: {_id: -1} },
  { $limit: 15 },
  { $project: { date: "$first", click: 1, _id: 0} }
])

它将获取最近 15 天的数据,并在 date 字段中返回每天的一些日期时间。例如:

[
    {
        "click" : 431,
        "date" : ISODate("2013-05-11T02:33:45.526Z")
    },
    {
        "click" : 702,
        "date" : ISODate("2013-05-08T02:11:00.503Z")
    },
            ...
    {
        "click" : 814,
        "date" : ISODate("2013-04-25T00:41:45.046Z")
    }
]

New answer using Mongo aggregation framework

After this question was asked and answered, 10gen released Mongodb version 2.2 with an aggregation framework, which is now the better way to do this sort of query. This query is a little challenging because you want to group by date and the values stored are timestamps, so you have to do something to convert the timestamps to dates that match. For the purposes of example I will just write a query that gets the right counts.

db.col.aggregate(
   { $group: { _id: { $dayOfYear: "$date"},
               click: { $sum: 1 } } }
   )

This will return something like:

[
    {
        "_id" : 144,
        "click" : 165
    },
    {
        "_id" : 275,
        "click" : 12
    }
]

You need to use $match to limit the query to the date range you are interested in and $project to rename _id to date. How you convert the day of year back to a date is left as an exercise for the reader. :-)

10gen has a handy SQL to Mongo Aggregation conversion chart worth bookmarking. There is also a specific article on date aggregation operators.

Getting a little fancier, you can use:

db.col.aggregate([
  { $group: {
      _id: {
        $add: [
         { $dayOfYear: "$date"}, 
         { $multiply: 
           [400, {$year: "$date"}]
         }
      ]},   
      click: { $sum: 1 },
      first: {$min: "$date"}
    }
  },
  { $sort: {_id: -1} },
  { $limit: 15 },
  { $project: { date: "$first", click: 1, _id: 0} }
])

which will get you the latest 15 days and return some datetime within each day in the date field. For example:

[
    {
        "click" : 431,
        "date" : ISODate("2013-05-11T02:33:45.526Z")
    },
    {
        "click" : 702,
        "date" : ISODate("2013-05-08T02:11:00.503Z")
    },
            ...
    {
        "click" : 814,
        "date" : ISODate("2013-04-25T00:41:45.046Z")
    }
]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文