使用 Scala (Casbah/Rogue) 在 MongoDB 中进行 Group By(聚合映射缩减函数)
这是我遇到问题的特定查询。我正在使用 Lift-mongo- 记录以便我可以使用 Rogue。我很高兴使用 Rogue 特定功能 语法,或者任何有效的东西。
虽然下面有一些通过 java 使用 javascript 字符串的好例子,但我想知道最佳实践是什么。
想象一下,这里有一个像这样的表。
comments {
_id
topic
title
text
created
}
所需的输出是主题及其计数的列表,例如
- 猫(24)
- 狗(12)
- 老鼠(5)
因此用户可以看到一个按计数排序的不同/的列表group by
下面是一些伪 SQL:
SELECT [DISTINCT] topic, count(topic) as topic_count
FROM comments
GROUP BY topic
ORDER BY topic_count DESC
LIMIT 10
OFFSET 10
一种方法是使用一些 DBObject DSL,例如
val cursor = coll.group( MongoDBObject(
"key" -> MongoDBObject( "topic" -> true ) ,
//
"initial" -> MongoDBObject( "count" -> 0 ) ,
"reduce" -> "function( obj , prev) { prev.count += obj.c; }"
"out" -> "topic_list_result"
))
[...].sort( MongoDBObject( "created" ->
-1 )).skip( offset ).limit( limit );
上面的变体无法编译。
我可以问“我做错了什么”,但我想我可以让我的 困惑更加严重:
- 我可以直接链接结果还是需要“输出”?
- 我可以期待什么样的输出 - 我的意思是,我是否迭代 光标,或者“out”参数
- 是“cond”必需的吗?
- 我应该使用count()还是distinct()
- 一些例子包含一个“map”参数...
我发现最近的一篇文章涵盖了java驱动程序暗示我应该 使用字符串而不是 DSL : http://blog.evilmonkeylabs.com/2011/02/ 28/MongoDB-1_8-MR-Java/
这会是 casbah 或 Rogue 中的首选方法吗?
更新:9/23
这在 Scala/Casbah 中失败(编译但产生错误 {MapReduceError 'None'} )
val map = "function (){ emit({ this.topic }, { count: 1 }); }"
val reduce = "function(key, values) { var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; }"
val out = coll.mapReduce( map , reduce , MapReduceInlineOutput )
ConfiggyObject.log.debug( out.toString() )
我在看到后决定了上面的内容 https://github.com/mongodb /casbah/blob/master/casbah-core/src/test/scala/MapReduceSpec.scala
猜测:
- 我误解了 toString 方法以及 out.object 是什么?
- 缺少最终确定?
- 缺少输出规格?
- https://jira.mongodb.org/browse/SCALA-43 ?
这可以从命令行按需要工作:
map = function (){
emit({ this.topic }, { count: 1 });
}
reduce = function(key, values) { var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; };
db.tweets.mapReduce( map, reduce, { out: "results" } ); //
db.results.ensureIndex( {count : 1});
db.results.find().sort( {count : 1});
更新 该问题尚未在 Mongo 中作为错误提交。 https://jira.mongodb.org/browse/SCALA-55
Here's a specific query I'm having trouble with. I'm using Lift-mongo-
records so that i can use Rogue. I'm happy to use Rogue specific
syntax , or whatever works.
While there are good examples for using javascript strings via java noted below, I'd like to know what the best practices might be.
Imagine here that there is a table like
comments {
_id
topic
title
text
created
}
The desired output is a list of topics and their count, for example
- cats (24)
- dogs (12)
- mice (5)
So a user can see an list, ordered by count, of a distinct/group by
Here's some psuedo SQL:
SELECT [DISTINCT] topic, count(topic) as topic_count
FROM comments
GROUP BY topic
ORDER BY topic_count DESC
LIMIT 10
OFFSET 10
One approach is using some DBObject DSL like
val cursor = coll.group( MongoDBObject(
"key" -> MongoDBObject( "topic" -> true ) ,
//
"initial" -> MongoDBObject( "count" -> 0 ) ,
"reduce" -> "function( obj , prev) { prev.count += obj.c; }"
"out" -> "topic_list_result"
))
[...].sort( MongoDBObject( "created" ->
-1 )).skip( offset ).limit( limit );
Variations of the above do not compile.
I could just ask "what am I doing wrong" but I thought I could make my
confusion more acute:
- can I chain the results directly or do I need "out"?
- what kind of output can I expect - I mean, do I iterate over a
cursor, or the "out" param - is "cond" required?
- should I be using count() or distinct()
- some examples contain a "map" param...
A recent post I found which covers the java driver implies I should
use strings instead of a DSL :
http://blog.evilmonkeylabs.com/2011/02/28/MongoDB-1_8-MR-Java/
Would this be the preferred method in either casbah or Rogue?
Update: 9/23
This fails in Scala/Casbah (compiles but produces error {MapReduceError 'None'} )
val map = "function (){ emit({ this.topic }, { count: 1 }); }"
val reduce = "function(key, values) { var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; }"
val out = coll.mapReduce( map , reduce , MapReduceInlineOutput )
ConfiggyObject.log.debug( out.toString() )
I settled on the above after seeing
https://github.com/mongodb/casbah/blob/master/casbah-core/src/test/scala/MapReduceSpec.scala
Guesses:
- I am misunderstanding the toString method and what the out.object is?
- missing finalize?
- missing output specification?
- https://jira.mongodb.org/browse/SCALA-43 ?
This works as desired from command line:
map = function (){
emit({ this.topic }, { count: 1 });
}
reduce = function(key, values) { var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; };
db.tweets.mapReduce( map, reduce, { out: "results" } ); //
db.results.ensureIndex( {count : 1});
db.results.find().sort( {count : 1});
Update
The issue has not been filed as a bug at Mongo.
https://jira.mongodb.org/browse/SCALA-55
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
以下对我有用:
res
是一个充满coll.T
的ArrayBuffer
,可以用通常的方式处理。The following worked for me:
res
was anArrayBuffer
full ofcoll.T
which can be handled in the usual ways.似乎是一个错误 - 某处。
目前,我现在有一个不太理想的解决方法,使用 eval() (速度较慢,安全性较差)...
然后我通常通过 casbah 查询输出表。
Appears to be a bug - somewhere.
For now, I have a less-than-ideal workaround working now, using eval() (slower, less safe) ...
Then I query the output table normally via casbah.