从每组中选择前 N 行

发布于 2024-11-17 13:18:09 字数 206 浏览 1 评论 0原文

我使用 mongodb 作为我的博客平台,用户可以在其中创建自己的博客。所有博客中的所有条目都位于条目集合中。条目的文档如下所示:

{
  'blog_id':xxx,
  'timestamp':xxx,
  'title':xxx,
  'content':xxx
}

正如问题所述,有没有办法为每个博客选择最后 3 个条目?

I use mongodb for my blog platform, where users can create their own blogs. All entries from all blogs are in an entries collection. The document of an entry looks like:

{
  'blog_id':xxx,
  'timestamp':xxx,
  'title':xxx,
  'content':xxx
}

As the question says, is there any way to select, say, last 3 entries for each blog?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

挽心 2024-11-24 13:18:09

您需要首先按 blog_idtimestamp 字段对集合中的文档进行排序,然后进行初始分组,按降序创建原始文档的数组。之后,您可以使用文档对数组进行切片以返回前 3 个元素。

在这个例子中可以遵循直觉:

db.entries.aggregate([
    { '$sort': { 'blog_id': 1, 'timestamp': -1 } }, 
    {       
        '$group': {
            '_id': '$blog_id',
            'docs': { '$push': '$ROOT' },
        }
    },
    {
        '$project': {
            'top_three': { 
                '$slice': ['$docs', 3]
            }
        }
    }
])

You need to first sort the documents in the collection by the blog_id and timestamp fields, then do an initial group which creates an array of the original documents in descending order. After that you can slice the array with the documents to return the first 3 elements.

The intuition can be followed in this example:

db.entries.aggregate([
    { '$sort': { 'blog_id': 1, 'timestamp': -1 } }, 
    {       
        '$group': {
            '_id': '$blog_id',
            'docs': { '$push': '$ROOT' },
        }
    },
    {
        '$project': {
            'top_three': { 
                '$slice': ['$docs', 3]
            }
        }
    }
])
野心澎湃 2024-11-24 13:18:09

Mongo 5.2 开始,它是新的 $topN 聚合累加器:

// { blog_id: "a", title: "plop",  content: "smthg" }
// { blog_id: "b", title: "hum",   content: "meh"   }
// { blog_id: "a", title: "hello", content: "world" }
// { blog_id: "a", title: "what",  content: "ever"  }
db.collection.aggregate([
  { $group: {
    _id: "$blog_id",
    messages: { $topN: { n: 2, sortBy: { _id: -1 }, output: "$ROOT" } }
  }}
])
// {
//   _id: "a",
//   messages: [
//     { blog_id: "a", title: "what",  content: "ever" },
//     { blog_id: "a", title: "hello", content: "world" }
//   ]
// }
// {
//   _id: "b",
//   messages: [
//     { blog_id: "b", title: "hum", content: "meh" }
//   ]
// }

这会应用一个 $topN 组累加:

  • 为每个组获取top 2 (n: 2) 元素
  • top 2,由 sortBy: { _id: -1 } 定义,在本例中意味着通过相反的插入顺序
  • ,并且对于每个元素记录将整个记录推送到组列表中(输出:“$$ROOT”),因为 $$ROOT 代表正在处理的整个文档。

Starting in Mongo 5.2, it's a perfect use case for the new $topN aggregation accumulator:

// { blog_id: "a", title: "plop",  content: "smthg" }
// { blog_id: "b", title: "hum",   content: "meh"   }
// { blog_id: "a", title: "hello", content: "world" }
// { blog_id: "a", title: "what",  content: "ever"  }
db.collection.aggregate([
  { $group: {
    _id: "$blog_id",
    messages: { $topN: { n: 2, sortBy: { _id: -1 }, output: "$ROOT" } }
  }}
])
// {
//   _id: "a",
//   messages: [
//     { blog_id: "a", title: "what",  content: "ever" },
//     { blog_id: "a", title: "hello", content: "world" }
//   ]
// }
// {
//   _id: "b",
//   messages: [
//     { blog_id: "b", title: "hum", content: "meh" }
//   ]
// }

This applies a $topN group accumulation that:

  • takes for each group the top 2 (n: 2) elements
  • top 2, as defined by sortBy: { _id: -1 }, which in this case means by reversed order of insertion
  • and for each record pushes the whole record in the group's list (output: "$$ROOT") since $$ROOT represents the whole document being processed.
会发光的星星闪亮亮i 2024-11-24 13:18:09

如果您可以接受两件事,那么在基本 mongo 中执行此操作的唯一方法是:

  • 条目文档中的附加字段,我们称之为“年龄”
  • 进行附加更新的新博客条目

如果是这样,请执行以下操作:

  1. 创建新的介绍后,进行正常插入,然后执行此更新以增加所有帖子的年龄(包括您刚刚为此博客插入的帖子):

    db.entries.update({blog_id: BLOG_ID}, {age:{$inc:1}}, false, true)

  2. 查询时,使用以下查询将返回每个博客的最新 3 个条目:

    db.entries.find({age:{$lte:3}, 时间戳:{$gte:STARTOFMONTH, $lt:ENDOFMONTH}}).sort({blog_id:1,age:1})

请注意,此解决方案实际上是并发安全的(没有具有重复年龄的条目)。

The only way to do this in basic mongo if you can live with two things :

  • An additional field in your entry document, let's call it "age"
  • A new blog entry taking an additional update

If so, here's how you do it :

  1. Upon creating a new intro do your normal insert and then execute this update to increase the age of all posts (including the one you just inserted for this blog) :

    db.entries.update({blog_id: BLOG_ID}, {age:{$inc:1}}, false, true)

  2. When querying, use the following query which will return the most recent 3 entries for each blog :

    db.entries.find({age:{$lte:3}, timestamp:{$gte:STARTOFMONTH, $lt:ENDOFMONTH}}).sort({blog_id:1, age:1})

Note that this solution is actually concurrency safe (no entries with duplicate ages).

时光与爱终年不遇 2024-11-24 13:18:09

可以使用组(聚合),但这将创建全表扫描。

您真的需要 3 个帖子吗?还是可以设置一个限制...例如:上周/月最多发布 3 个帖子?

It's possible with group (aggregation), but this will create a full-table scan.

Do you really need exactly 3 or can you set a limit...e.g.: max 3 posts from the last week/month?

说好的呢 2024-11-24 13:18:09

这个答案使用来自另一个问题的 drcosta 的地图减少做到了这一点

在 mongo 中,如何使用 Map Reduce 来获取按最近排序的组

mapper = function () {
  emit(this.category, {top:[this.score]});
}

reducer = function (key, values) {
  var scores = [];
  values.forEach(
    function (obj) {
      obj.top.forEach(
        function (score) {
          scores[scores.length] = score;
      });
  });
  scores.sort();
  scores.reverse();
  return {top:scores.slice(0, 3)};
}

function find_top_scores(categories) {
  var query = [];
  db.top_foos.find({_id:{$in:categories}}).forEach(
    function (topscores) {
      query[query.length] = {
        category:topscores._id,
        score:{$in:topscores.value.top}
      };
  });
  return db.foo.find({$or:query});

This answer using map reduce by drcosta from another question did the trick

In mongo, how do I use map reduce to get a group by ordered by most recent

mapper = function () {
  emit(this.category, {top:[this.score]});
}

reducer = function (key, values) {
  var scores = [];
  values.forEach(
    function (obj) {
      obj.top.forEach(
        function (score) {
          scores[scores.length] = score;
      });
  });
  scores.sort();
  scores.reverse();
  return {top:scores.slice(0, 3)};
}

function find_top_scores(categories) {
  var query = [];
  db.top_foos.find({_id:{$in:categories}}).forEach(
    function (topscores) {
      query[query.length] = {
        category:topscores._id,
        score:{$in:topscores.value.top}
      };
  });
  return db.foo.find({$or:query});
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文