MongoDB MapReduce 就地更新如何
*基本上,我试图根据过去一小时内的分数对对象进行排序。
我正在尝试为数据库中的对象生成每小时的投票总和。投票被嵌入到每个对象中。对象架构如下所示:
{
_id: ObjectId
score: int
hourly-score: int <- need to update this value so I can order by it
recently-voted: boolean
votes: {
"4e4634821dff6f103c040000": { <- Key is __toString of voter ObjectId
"_id": ObjectId("4e4634821dff6f103c040000"), <- Voter ObjectId
"a": 1, <- Vote amount
"ca": ISODate("2011-08-16T00:01:34.975Z"), <- Created at MongoDate
"ts": 1313452894 <- Created at timestamp
},
... repeat ...
}
}
这个问题实际上与我几天前问的一个问题有关 在 MongoDB 中建模投票系统的最佳方式
我如何(我可以?)运行 MapReduce 命令来执行以下操作:
- 仅在最近投票 = true 或每小时的对象上运行-分数> 0.
- 计算最近一小时内创建的投票总数。
- 更新每小时分数 = 上面计算的总和,最近投票 = false。
我还阅读了此处,我可以执行通过在 M/R 命令之前运行 db.getMongo().setSlaveOk() 在从数据库上进行 MapReduce。我可以在从属设备上运行reduce并更新主数据库吗?
使用 Mongo MapReduce 是否可以进行就地更新?
*Basically I'm trying to order objects by their score over the last hour.
I'm trying to generate an hourly votes sum for objects in my database. Votes are embedded into each object. The object schema looks like this:
{
_id: ObjectId
score: int
hourly-score: int <- need to update this value so I can order by it
recently-voted: boolean
votes: {
"4e4634821dff6f103c040000": { <- Key is __toString of voter ObjectId
"_id": ObjectId("4e4634821dff6f103c040000"), <- Voter ObjectId
"a": 1, <- Vote amount
"ca": ISODate("2011-08-16T00:01:34.975Z"), <- Created at MongoDate
"ts": 1313452894 <- Created at timestamp
},
... repeat ...
}
}
This question is actually related to a question I asked a couple of days ago Best way to model a voting system in MongoDB
How would I (can I?) run a MapReduce command to do the following:
- Only run on objects with recently-voted = true OR hourly-score > 0.
- Calculate the sum of the votes created in the last hour.
- Update hourly-score = the sum calculated above, and recently-voted = false.
I also read here that I can perform a MapReduce on the slave DB by running db.getMongo().setSlaveOk() before the M/R command. Could I run the reduce on a slave and update the master DB?
Are in-place updates even possible with Mongo MapReduce?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你绝对可以做到这一点。我将逐一解答您的问题:
1.
您可以与映射缩减一起指定查询,它会过滤将传递到映射阶段的对象集。在 mongo shell 中,这看起来像(假设
m
和r
分别是映射器和化简器函数的名称):2.
第 1 步将让您对过去一小时内至少有一次投票(或将
recently-voted
设置为 true)的所有文档使用映射器,但并非所有投票都在最后小时。因此,您需要在映射器中过滤列表,并且仅发出您想要计数的那些选票:并减少:
3。
要更新每小时分数表,您可以使用
reduceOutput
选项进行map-reduce,这将使用发出的值和输出集合中先前保存的值(如果有)调用您的reducer 。该传递的结果将保存到输出集合中。这看起来像:除了重新减少输出之外,您还可以使用
merge
,它将用新创建的文档覆盖输出集合中的文档(但留下任何带有_id
的文档> 与 mr 作业创建的_id
不同),replace
,这实际上是一个删除并创建操作,并且是默认操作,或者使用{ inline: 1}
,这将返回结果直接到 shell 或您的驱动程序。请注意,使用{inline: 1}
时,您的结果必须符合单个文档允许的大小(在最新的 MongoDB 版本中为 16MB)。(4.)
您可以在辅助节点(“从属”)上运行 Map-Reduce 作业,但由于辅助节点无法接受写入(这就是它们成为辅助节点的原因),因此您只能在使用内联输出时执行此操作。
You can definitely do this. I'll address your questions one at a time:
1.
You can specify a query along with your map-reduce, which filters the set of objects which will be passed into the map phase. In the mongo shell, this would look like (assuming
m
andr
are the names of your mapper and reducer functions, respectively):2.
Step #1 will let you use your mapper on all documents with at least one vote in the last hour (or with
recently-voted
set to true), but not all the votes will have been in the last hour. So you'll need to filter the list in your mapper, and only emit those votes you wish to count:And to reduce:
3.
To update the hourly scores table, you can use the
reduceOutput
option to map-reduce, which will call your reducer with both the emitted values, and the previously saved value in the output collection, (if any). The result of that pass will be saved into the output collection. This looks like:In addition to re-reducing output, you can use
merge
which will overwrite documents in the output collection with newly created ones (but leaving behind any documents with an_id
different than the_id
s created by your m-r job),replace
, which is effectively a drop-and-create operation and is the default, or use{inline: 1}
, which will return the results directly to the shell or to your driver. Note that when using{inline: 1}
, your results must fit in the size allowed for a single document (16MB in recent MongoDB releases).(4.)
You can run map-reduce jobs on secondaries ("slaves"), but since secondaries cannot accept writes (that's what makes them secondary), you can only do this when using inline output.