如何在 Java 中向 Mongodb map/reduce 传递参数?

发布于 2024-11-07 18:26:49 字数 2164 浏览 0 评论 0原文

我有一些这样的数据:

{id: 1, text: "This is a sentence about dogs", indices: ["sentence", "dogs"]}
{id: 2, text: "This sentence is about cats and dogs", indices: ["sentence", "cats", "dogs"]}

我从文本中手动提取关键术语并将它们存储为索引。我希望能够进行搜索并使用最匹配的索引对结果进行排序。因此,对于这个例子,我希望能够传递“cats”和“dogs”并返回两个对象,但 id=2 应该是第一个,score=2。

我首先尝试使用 DBCollection.group 函数

{public DBObject group(DBObject key, DBObject条件, DBObject 初始, 字符串减少, String Finalize)}

但我没有看到发送参数的方法。我尝试过:

key: {id: true},
cond: {"indices" $in ['cats', 'dogs']},
initial: {score: 0} 
reduce: function(doc, out){ out.score++; }

但显然这只会为 2 个对象中的每一个返回 1 的计数。

我意识到我可以将关键字参数作为简化对象的初始配置的一部分发送。

final List<String> targetTerms = Arrays.asList("dogs", "cats");
final Datastore ds = ….
final DBCollection coll = ds.getCollection(Example.class);
BasicDBObject key = new BasicDBObject("_id", true);
BasicDBObject cond = new BasicDBObject();
cond.append("indices", new BasicDBObject("$in", targetTerms));
BasicDBObject initial = new BasicDBObject();
initial.append("score", 0);
initial.append("targetTerms", targetTerms);
String reduce = "function (obj, prev) { " +
        "  for (i in prev.targetTerms) {" +
        "    targetTerm = prev.targetTerms[i];"+
        "      for (j in obj.indices) {" +
        "        var index = obj.indices[j];"+
        "        if (targetTerm === index) prev.score++;" +
        "    }" +
        "  }" +
        "}";
String fn = null;
final BasicDBList group = (BasicDBList) coll.group(key, cond, initial, reduce, fn);

我得到这样的结果:

{ "_id" : { "$oid" : "4dcfe16c05a063bb07ccbb7b"} , "score" : 1.0 , "targetTerms" : [ "virtual" , "library"]}
{ "_id" : { "$oid" : "4dcfe17d05a063bb07ccbb83"} , "score" : 2.0 , "targetTerms" : [ "virtual" , "library"]}

这得到了我想要的分值,并且我能够缩小要使用更具体的条件规则处理的条目的范围。

所以我有几个问题:

  1. 这是将“参数”发送到组操作的reduce函数的好方法吗?
  2. 有没有办法在返回客户端之前对 mongodb 内部的输出进行排序(也许是限制)?
  3. 这会在分片 Mongodb 实例上中断吗?

I have some data like this:

{id: 1, text: "This is a sentence about dogs", indices: ["sentence", "dogs"]}
{id: 2, text: "This sentence is about cats and dogs", indices: ["sentence", "cats", "dogs"]}

Where I have manually extracted key terms from the text and stored them as indices. I want to be able to do a search and order the results with the most matching indices. So for this example, I would like to be able to pass "cats" and "dogs" and get both objects returned, but id=2 should be first with score=2.

I first tried to use the DBCollection.group function

{public DBObject group(DBObject key,
DBObject cond,
DBObject initial,
String reduce,
String finalize)
}

But I don't see a way to send parameters. I tried:

key: {id: true},
cond: {"indices" $in ['cats', 'dogs']},
initial: {score: 0} 
reduce: function(doc, out){ out.score++; }

but obviously this will just return a count of 1 for each of the 2 objects.

I realised that I could send the keyword parameters as part of the initial config of the reduced object.

final List<String> targetTerms = Arrays.asList("dogs", "cats");
final Datastore ds = ….
final DBCollection coll = ds.getCollection(Example.class);
BasicDBObject key = new BasicDBObject("_id", true);
BasicDBObject cond = new BasicDBObject();
cond.append("indices", new BasicDBObject("$in", targetTerms));
BasicDBObject initial = new BasicDBObject();
initial.append("score", 0);
initial.append("targetTerms", targetTerms);
String reduce = "function (obj, prev) { " +
        "  for (i in prev.targetTerms) {" +
        "    targetTerm = prev.targetTerms[i];"+
        "      for (j in obj.indices) {" +
        "        var index = obj.indices[j];"+
        "        if (targetTerm === index) prev.score++;" +
        "    }" +
        "  }" +
        "}";
String fn = null;
final BasicDBList group = (BasicDBList) coll.group(key, cond, initial, reduce, fn);

I get results like this:

{ "_id" : { "$oid" : "4dcfe16c05a063bb07ccbb7b"} , "score" : 1.0 , "targetTerms" : [ "virtual" , "library"]}
{ "_id" : { "$oid" : "4dcfe17d05a063bb07ccbb83"} , "score" : 2.0 , "targetTerms" : [ "virtual" , "library"]}

This got me the score values that I wanted, and I am able to narrow down the entries to be processed with more specific conditional rules.

So I have a few questions:

  1. Is this a good way to send "parameters" to the group action's reduce function?
  2. Is there a way to sort (and perhaps limit) the output inside mongodb before returning to the client?
  3. Will this break on sharded Mongodb instances?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文