MongoDB：限制 $gt 查询的结果（来自 pymongo）

发布于 2024-08-21 16:38:06 字数 847 浏览 12 评论 0原文

我正在从网络服务收集一些统计数据，并将其存储在集合中。数据看起来与此类似（但有更多字段）：

{"downloads": 30, "dt": "2010-02-17T16:56:34.163000"}
{"downloads": 30, "dt": "2010-02-17T17:56:34.163000"}
{"downloads": 30, "dt": "2010-02-17T18:56:34.163000"}
{"downloads": 30, "dt": "2010-02-17T19:56:34.163000"}
{"downloads": 30, "dt": "2010-02-17T20:56:34.163000"}
{…}
{"downloads": 30, "dt": "2010-02-18T17:56:34.163000"}
{"downloads": 30, "dt": "2010-02-18T18:56:34.163000"}
{"downloads": 30, "dt": "2010-02-18T19:56:34.163000"}
{"downloads": 30, "dt": "2010-02-18T20:56:34.163000"}

如果有人请求过去三十天的每日数字，这将意味着（在本例中）“下载”pr 的最大数量。天。这是当天的最后一个记录。

通过使用collection.find({"dt": {"$gt": datetime_obj_30_days_ago}})，我当然得到了所有行，这不是很合适。所以我正在寻找一种方法，只返回给定时间段内的最后一天。

有人告诉我 group() 可能是可行的方法，但我不太明白如何让它在这种情况下工作。

任何提示、指示将不胜感激！

原文

I'm gathering some statistics from a web service, and storing it in a collection. The data looks similar to this (but with more fields):

{"downloads": 30, "dt": "2010-02-17T16:56:34.163000"}
{"downloads": 30, "dt": "2010-02-17T17:56:34.163000"}
{"downloads": 30, "dt": "2010-02-17T18:56:34.163000"}
{"downloads": 30, "dt": "2010-02-17T19:56:34.163000"}
{"downloads": 30, "dt": "2010-02-17T20:56:34.163000"}
{…}
{"downloads": 30, "dt": "2010-02-18T17:56:34.163000"}
{"downloads": 30, "dt": "2010-02-18T18:56:34.163000"}
{"downloads": 30, "dt": "2010-02-18T19:56:34.163000"}
{"downloads": 30, "dt": "2010-02-18T20:56:34.163000"}

If someone requests the daily numbers for the last thirty days, that would mean the max amount of (in this example) 'downloads' pr. day. Which is the last record of the day.

By using collection.find({"dt": {"$gt": datetime_obj_30_days_ago}}), I of course get all the rows, which is not very suitable. So I'm looking for a way to only return the last of the day for the given period.

I was told that group() might be the way to go, but I can't quite understand how to get it working in this instance.

Any tips, pointers would be very appreciated!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

孤云独去闲 2024-08-28 16:38:06

您可以使用组来执行此操作。在您的示例中，您需要提供一个 javascript 函数来计算键（以及化简函数），因为您只需要日期时间字段的日期部分。这应该有效：

db.coll.group(
    key='function(doc) { return {"dt": doc.dt.toDateString()} }',
    condition={'dt': {'$gt': datetime_obj_30_days_ago}},
    initial={'downloads': 0},
    reduce='function(curr, prev) { prev.downloads = Math.max(curr.downloads, prev.downloads) }'
)

请记住，仍然对过去一个月进行线性扫描，只是在服务器而不是客户端上进行。简单地单独选择每天的最大值可能会更快。

You can do this using group. In your example you'd need to supply a javascript function to compute the key (as well the reduce function), because you want only the date component of the datetime field. This should work:

db.coll.group(
    key='function(doc) { return {"dt": doc.dt.toDateString()} }',
    condition={'dt': {'$gt': datetime_obj_30_days_ago}},
    initial={'downloads': 0},
    reduce='function(curr, prev) { prev.downloads = Math.max(curr.downloads, prev.downloads) }'
)

Keep in mind that still does a linear scan of the past month, just on the server instead of the client. It's possible that simply selecting the max value of each day individually is faster.

回复收藏 0 原文

~没有更多了~