使用 Scala (Casbah/Rogue) 在 MongoDB 中进行 Group By(聚合映射缩减函数)

发布于 2024-12-03 01:56:20 字数 2862 浏览 2 评论 0原文

这是我遇到问题的特定查询。我正在使用 Lift-mongo- 记录以便我可以使用 Rogue。我很高兴使用 Rogue 特定功能 语法,或者任何有效的东西。

虽然下面有一些通过 java 使用 javascript 字符串的好例子,但我想知道最佳实践是什么。

想象一下,这里有一个像这样的表。

comments {
 _id
 topic
 title
 text
 created
}

所需的输出是主题及其计数的列表,例如

  • 猫(24)
  • 狗(12)
  • 老鼠(5)

因此用户可以看到一个按计数排序的不同/的列表group by

下面是一些伪 SQL:

SELECT [DISTINCT] topic, count(topic) as topic_count
FROM comments
GROUP BY topic
ORDER BY topic_count DESC
LIMIT 10
OFFSET 10

一种方法是使用一些 DBObject DSL,例如

val cursor  = coll.group( MongoDBObject(
"key" -> MongoDBObject( "topic" -> true ) ,
//
"initial" -> MongoDBObject( "count" ->  0 ) ,
"reduce" -> "function( obj , prev) { prev.count += obj.c; }"
 "out" -> "topic_list_result"
))

 [...].sort( MongoDBObject( "created" ->
-1 )).skip( offset ).limit( limit );

上面的变体无法编译。

我可以问“我做错了什么”,但我想我可以让我的 困惑更加严重:

  • 我可以直接链接结果还是需要“输出”?
  • 我可以期待什么样的输出 - 我的意思是,我是否迭代 光标,或者“out”参数
  • 是“cond”必需的吗?
  • 我应该使用count()还是distinct()
  • 一些例子包含一个“map”参数...

我发现最近的一篇文章涵盖了java驱动程序暗示我应该 使用字符串而不是 DSL : http://blog.evilmonkeylabs.com/2011/02/ 28/MongoDB-1_8-MR-Java/

这会是 casbah 或 Rogue 中的首选方法吗?

更新:9/23

这在 Scala/Cas​​bah 中失败(编译但产生错误 {MapReduceError 'None'} )

val map = "function (){ emit({ this.topic }, { count: 1 }); }"
val reduce = "function(key, values) {  var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; }"
val out  = coll.mapReduce(  map ,  reduce  , MapReduceInlineOutput  )
ConfiggyObject.log.debug( out.toString() )

我在看到后决定了上面的内容 https://github.com/mongodb /casbah/blob/master/casbah-core/src/test/scala/MapReduceSpec.scala

猜测:

这可以从命令行按需要工作:

   map = function (){
        emit({ this.topic }, { count: 1 });
    }

    reduce = function(key, values) {  var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; };

    db.tweets.mapReduce( map, reduce,  { out: "results" } ); //
    db.results.ensureIndex( {count : 1});
    db.results.find().sort( {count : 1});

更新 该问题尚未在 Mongo 中作为错误提交。 https://jira.mongodb.org/browse/SCALA-55

Here's a specific query I'm having trouble with. I'm using Lift-mongo-
records so that i can use Rogue. I'm happy to use Rogue specific
syntax , or whatever works.

While there are good examples for using javascript strings via java noted below, I'd like to know what the best practices might be.

Imagine here that there is a table like

comments {
 _id
 topic
 title
 text
 created
}

The desired output is a list of topics and their count, for example

  • cats (24)
  • dogs (12)
  • mice (5)

So a user can see an list, ordered by count, of a distinct/group by

Here's some psuedo SQL:

SELECT [DISTINCT] topic, count(topic) as topic_count
FROM comments
GROUP BY topic
ORDER BY topic_count DESC
LIMIT 10
OFFSET 10

One approach is using some DBObject DSL like

val cursor  = coll.group( MongoDBObject(
"key" -> MongoDBObject( "topic" -> true ) ,
//
"initial" -> MongoDBObject( "count" ->  0 ) ,
"reduce" -> "function( obj , prev) { prev.count += obj.c; }"
 "out" -> "topic_list_result"
))

 [...].sort( MongoDBObject( "created" ->
-1 )).skip( offset ).limit( limit );

Variations of the above do not compile.

I could just ask "what am I doing wrong" but I thought I could make my
confusion more acute:

  • can I chain the results directly or do I need "out"?
  • what kind of output can I expect - I mean, do I iterate over a
    cursor, or the "out" param
  • is "cond" required?
  • should I be using count() or distinct()
  • some examples contain a "map" param...

A recent post I found which covers the java driver implies I should
use strings instead of a DSL :
http://blog.evilmonkeylabs.com/2011/02/28/MongoDB-1_8-MR-Java/

Would this be the preferred method in either casbah or Rogue?

Update: 9/23

This fails in Scala/Casbah (compiles but produces error {MapReduceError 'None'} )

val map = "function (){ emit({ this.topic }, { count: 1 }); }"
val reduce = "function(key, values) {  var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; }"
val out  = coll.mapReduce(  map ,  reduce  , MapReduceInlineOutput  )
ConfiggyObject.log.debug( out.toString() )

I settled on the above after seeing
https://github.com/mongodb/casbah/blob/master/casbah-core/src/test/scala/MapReduceSpec.scala

Guesses:

This works as desired from command line:

   map = function (){
        emit({ this.topic }, { count: 1 });
    }

    reduce = function(key, values) {  var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; };

    db.tweets.mapReduce( map, reduce,  { out: "results" } ); //
    db.results.ensureIndex( {count : 1});
    db.results.find().sort( {count : 1});

Update
The issue has not been filed as a bug at Mongo.
https://jira.mongodb.org/browse/SCALA-55

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

深爱不及久伴 2024-12-10 01:56:20

以下对我有用:

val coll = MongoConnection()("comments")
val reduce = """function(obj,prev) { prev.csum += 1; }"""
val res = coll.group( MongoDBObject("topic"->true),
                       MongoDBObject(), MongoDBObject( "csum" -> 0 ), reduce)

res 是一个充满 coll.TArrayBuffer ,可以用通常的方式处理。

The following worked for me:

val coll = MongoConnection()("comments")
val reduce = """function(obj,prev) { prev.csum += 1; }"""
val res = coll.group( MongoDBObject("topic"->true),
                       MongoDBObject(), MongoDBObject( "csum" -> 0 ), reduce)

res was an ArrayBuffer full of coll.T which can be handled in the usual ways.

星光不落少年眉 2024-12-10 01:56:20

似乎是一个错误 - 某处。

目前,我现在有一个不太理想的解决方法,使用 eval() (速度较慢,安全性较差)...

db.eval( "map = function (){ emit( { topic: this.topic } , { count: 1 }); } ; ");
db.eval( "reduce = function(key, values) { var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; }; ");
db.eval( " db.tweets.mapReduce( map, reduce, { out: \"tweetresults\" } ); ");
db.eval( " db.tweetresults.ensureIndex( {count : 1}); ");

然后我通常通过 casbah 查询输出表。

Appears to be a bug - somewhere.

For now, I have a less-than-ideal workaround working now, using eval() (slower, less safe) ...

db.eval( "map = function (){ emit( { topic: this.topic } , { count: 1 }); } ; ");
db.eval( "reduce = function(key, values) { var count = 0; values.forEach(function(v) { count += v['count']; }); return {count: count}; }; ");
db.eval( " db.tweets.mapReduce( map, reduce, { out: \"tweetresults\" } ); ");
db.eval( " db.tweetresults.ensureIndex( {count : 1}); ");

Then I query the output table normally via casbah.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文