MongoDB MapReduce 就地更新如何

发布于 2024-11-29 14:34:05 字数 1209 浏览 0 评论 0原文

*基本上,我试图根据过去一小时内的分数对对象进行排序。

我正在尝试为数据库中的对象生成每小时的投票总和。投票被嵌入到每个对象中。对象架构如下所示:

{
    _id: ObjectId
    score: int
    hourly-score: int <- need to update this value so I can order by it
    recently-voted: boolean
    votes: {
        "4e4634821dff6f103c040000": { <- Key is __toString of voter ObjectId
            "_id": ObjectId("4e4634821dff6f103c040000"), <- Voter ObjectId
            "a": 1, <- Vote amount
            "ca": ISODate("2011-08-16T00:01:34.975Z"), <- Created at MongoDate
            "ts": 1313452894 <- Created at timestamp
        },
        ... repeat ...
    }
}

这个问题实际上与我几天前问的一个问题有关 在 MongoDB 中建模投票系统的最佳方式

我如何(我可以?)运行 MapReduce 命令来执行以下操作:

  1. 仅在最近投票 = true 或每小时的对象上运行-分数> 0.
  2. 计算最近一小时内创建的投票总数。
  3. 更新每小时分数 = 上面计算的总和,最近投票 = false。

我还阅读了此处,我可以执行通过在 M/R 命令之前运行 db.getMongo().setSlaveOk() 在从数据库上进行 MapReduce。我可以在从属设备上运行reduce并更新主数据库吗?

使用 Mongo MapReduce 是否可以进行就地更新?

*Basically I'm trying to order objects by their score over the last hour.

I'm trying to generate an hourly votes sum for objects in my database. Votes are embedded into each object. The object schema looks like this:

{
    _id: ObjectId
    score: int
    hourly-score: int <- need to update this value so I can order by it
    recently-voted: boolean
    votes: {
        "4e4634821dff6f103c040000": { <- Key is __toString of voter ObjectId
            "_id": ObjectId("4e4634821dff6f103c040000"), <- Voter ObjectId
            "a": 1, <- Vote amount
            "ca": ISODate("2011-08-16T00:01:34.975Z"), <- Created at MongoDate
            "ts": 1313452894 <- Created at timestamp
        },
        ... repeat ...
    }
}

This question is actually related to a question I asked a couple of days ago Best way to model a voting system in MongoDB

How would I (can I?) run a MapReduce command to do the following:

  1. Only run on objects with recently-voted = true OR hourly-score > 0.
  2. Calculate the sum of the votes created in the last hour.
  3. Update hourly-score = the sum calculated above, and recently-voted = false.

I also read here that I can perform a MapReduce on the slave DB by running db.getMongo().setSlaveOk() before the M/R command. Could I run the reduce on a slave and update the master DB?

Are in-place updates even possible with Mongo MapReduce?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

浅笑依然 2024-12-06 14:34:05

你绝对可以做到这一点。我将逐一解答您的问题:

1.
您可以与映射缩减一起指定查询,它会过滤将传递到映射阶段的对象集。在 mongo shell 中,这看起来像(假设 mr 分别是映射器和化简器函数的名称):

> db.coll.mapReduce(m, r, {query: {$or: [{"recently-voted": true}, {"hourly-score": {$gt: 0}}]}})

2.
第 1 步将让您对过去一小时内至少有一次投票(或将 recently-voted 设置为 true)的所有文档使用映射器,但并非所有投票都在最后小时。因此,您需要在映射器中过滤列表,并且仅发出您想要计数的那些选票:

function m() {
  var hour_ago = new Date() - 3600000;
  this.votes.forEach(function (vote) {
    if (vote.ts > hour_ago) {
      emit(/* your key */, this.vote.a);
    }
  });
}

并减少:

function r(key, values) {
  var sum = 0;
  values.forEach(function(value) { sum += value; });
  return sum;
}

3。
要更新每小时分数表,您可以使用reduceOutput选项进行map-reduce,这将使用发出的值和输出集合中先前保存的值(如果有)调用您的reducer 。该传递的结果将保存到输出集合中。这看起来像:

> db.coll.mapReduce(m, r, {query: ..., out: {reduce: "output_coll"}})

除了重新减少输出之外,您还可以使用 merge ,它将用新创建的文档覆盖输出集合中的文档(但留下任何带有 _id 的文档> 与 mr 作业创建的 _id 不同),replace,这实际上是一个删除并创建操作,并且是默认操作,或者使用 { inline: 1},这将返回结果直接到 shell 或您的驱动程序。请注意,使用 {inline: 1} 时,您的结果必须符合单个文档允许的大小(在最新的 MongoDB 版本中为 16MB)。

(4.)
您可以在辅助节点(“从属”)上运行 Map-Reduce 作业,但由于辅助节点无法接受写入(这就是它们成为辅助节点的原因),因此您只能在使用内联输出时执行此操作。

You can definitely do this. I'll address your questions one at a time:

1.
You can specify a query along with your map-reduce, which filters the set of objects which will be passed into the map phase. In the mongo shell, this would look like (assuming m and r are the names of your mapper and reducer functions, respectively):

> db.coll.mapReduce(m, r, {query: {$or: [{"recently-voted": true}, {"hourly-score": {$gt: 0}}]}})

2.
Step #1 will let you use your mapper on all documents with at least one vote in the last hour (or with recently-voted set to true), but not all the votes will have been in the last hour. So you'll need to filter the list in your mapper, and only emit those votes you wish to count:

function m() {
  var hour_ago = new Date() - 3600000;
  this.votes.forEach(function (vote) {
    if (vote.ts > hour_ago) {
      emit(/* your key */, this.vote.a);
    }
  });
}

And to reduce:

function r(key, values) {
  var sum = 0;
  values.forEach(function(value) { sum += value; });
  return sum;
}

3.
To update the hourly scores table, you can use the reduceOutput option to map-reduce, which will call your reducer with both the emitted values, and the previously saved value in the output collection, (if any). The result of that pass will be saved into the output collection. This looks like:

> db.coll.mapReduce(m, r, {query: ..., out: {reduce: "output_coll"}})

In addition to re-reducing output, you can use merge which will overwrite documents in the output collection with newly created ones (but leaving behind any documents with an _id different than the _ids created by your m-r job), replace, which is effectively a drop-and-create operation and is the default, or use {inline: 1}, which will return the results directly to the shell or to your driver. Note that when using {inline: 1}, your results must fit in the size allowed for a single document (16MB in recent MongoDB releases).

(4.)
You can run map-reduce jobs on secondaries ("slaves"), but since secondaries cannot accept writes (that's what makes them secondary), you can only do this when using inline output.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文