如何在算术聚合表达式期间访问整体文档计数

发布于 2025-01-18 22:24:39 字数 918 浏览 0 评论 0原文

我有以这种格式的文档集合：

{
    _id: ObjectId,
    items: [
        {
            defindex: number,
            ...
        },
        ...
    ]
}

省略了架构的某些部分，并且在项目数组中保证该数组中的每个项目fefindex对于该数组都是唯一的。在不同文档的项目字段中可以发生相同的fefindex，但如果存在，则只会在每个相应数组中发生一次。

我当前在项目字段上调用 $ undind ，然后在 $ sortbycount 上items.defindex进行分类计数最高的项目列表。

我现在想使用 $ set new列表添加一个新字段称为usage，该列表显示了该项目的用法是该项目的最初文档初始文档的百分比收藏。（即，如果该项目的计数是1300，并且整体文档计数pre- $ undind 为2600，用法值将为0.5）

我的最初计划是使用 $ facet 在初始集合时，创建一个文档：

{
    total: number (achieved using $count),
    documents: [{...}] (achieved using an empty $set)
}

然后在“文档”字段上调用 $ undind ，以将整个文档计数添加到每个文档中。然后，使用 $ set 来计算用法值是微不足道的，因为总数是文档本身的字段。

但是，这种方法遇到了内存问题，因为我的收藏远大于16MB限制。

我该如何解决？

原文

I have a collection of documents in this format:

{
    _id: ObjectId,
    items: [
        {
            defindex: number,
            ...
        },
        ...
    ]
}

Certain parts of the schema not relevant are omitted, and each item defindex within the items array is guaranteed to be unique for that array. The same defindex can occur in different documents' items fields, but will only occur once in each respective array if present.

I currently call $unwind upon the items field, followed by $sortByCount upon items.defindex to get a sorted list of items with the highest count.

I now want to add a new field to this final sorted list using $set called usage, that shows the item's usage as a percentage of the initial number of total documents in the collection.
(i.e. if the item's count is 1300, and the overall document count pre-$unwind was 2600, the usage value will be 0.5)

My initial plan was to use $facet upon the initial collection, creating a document as so:

{
    total: number (achieved using $count),
    documents: [{...}] (achieved using an empty $set)
}

And then calling $unwind on the documents field to add the total document count to each document. Calculating the usage value is then trivial using $set, since the total count is a field in the document itself.

This approach ran into memory issues though, since my collection is far larger than the 16MB limit.

How would I solve this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

生来就爱笑 2025-01-25 22:24:39

一种方法是使用$ setWindowfields：

db.collection.aggregate([
  {
    $setWindowFields: {
      output: {
        totalCount: {$count: {}}
      }
    }
  },
  {
    $unwind: "$items"
  },
  {
    $group: {
      _id: "$items.defindex",
      count: {$sum: 1},
      totalCount: {$first: "$totalCount"}
    }
  },
  {
    $project: {
      count: 1,
      usage: {$divide: ["$count", "$totalCount"]
      }
    }
  },
  {$sort: {count: -1}}
])

如您所见

One way to do it is use $setWindowFields:

db.collection.aggregate([
  {
    $setWindowFields: {
      output: {
        totalCount: {$count: {}}
      }
    }
  },
  {
    $unwind: "$items"
  },
  {
    $group: {
      _id: "$items.defindex",
      count: {$sum: 1},
      totalCount: {$first: "$totalCount"}
    }
  },
  {
    $project: {
      count: 1,
      usage: {$divide: ["$count", "$totalCount"]
      }
    }
  },
  {$sort: {count: -1}}
])

As you can see here

回复收藏 0 原文

~没有更多了~