CouchDB、MapReduce:查询时间片

发布于 2024-11-25 08:17:49 字数 1526 浏览 0 评论 0 原文

为了使用 CouchDB 监视应用程序,我需要总结数据的一个字段(例如执行已记录的方法所需的时间)。

对于map-reduce来说这对我来说没有问题,但我只需要对特殊时间片中记录的数据进行求和。

示例记录:

{_id: 1, methodID:1, recorded: 100, timeneeded: 10}, 
{_id: 2, methodID:1, recorded: 200, timeneeded: 11}, 
{_id: 3, methodID:2, recorded: 200, timeneeded: 2}, 
{_id: 4, methodID:1, recorded: 300, timeneeded: 6}, 
{_id: 5, methodID:2, recorded: 310, timeneeded: 3}, 
{_id: 6, methodID:1, recorded: 400, timeneeded: 9}

现在我想获取已记录的所有记录的timeneeded总和,范围在200到350之间,并按methodID<分组/代码>。 (methodID:1 为 17,methodID:2 为 5。)

我该怎么做?


我现在尝试使用使用 WickedGrey 想法的列表函数。在这里查看我的功能:

地图功能:

function(doc) {  
  emit([ doc.recorded], {methodID:doc.methodID, timeneeded:doc.timeneeded}); 
}

列表功能:

"function(head, req) {  
  var combined_values = {};
  var row;   
  while (row = getRow()) {  

      if( row.values.methodID in combined_values)     { 
        combined_values[ row.values.methodID] +=row.values.timeneeded; 
      }        
      else {  
        combined_values[ row.values.methodID] = row.values.timeneeded;    
      } 

  } 

  for(var methodID in combined_values){ 
    send( toJSON({method: methodID, timeneeded:combined_values[methodID]}) );
  }   
}"

现在我遇到问题: 1. 我总是以文件形式获取结果,我的 Firefox 会询问我是否要下载它,而不是像查询经典视图那样在浏览器中查看它。 2. 据我了解,结果现在是在列表函数中动态计算的。我预计这对于数亿条记录来说不会很快......有什么想法可以让它更快吗?

感谢您的帮助! 安迪

For a monitoring an application with CouchDB I need to sum up a field of my data (for example the time needed to execute a method that has been logged).

That's no problem for me with map-reduce, but I need to sum up only the data recorded in a special time slice.

Example records:

{_id: 1, methodID:1, recorded: 100, timeneeded: 10}, 
{_id: 2, methodID:1, recorded: 200, timeneeded: 11}, 
{_id: 3, methodID:2, recorded: 200, timeneeded: 2}, 
{_id: 4, methodID:1, recorded: 300, timeneeded: 6}, 
{_id: 5, methodID:2, recorded: 310, timeneeded: 3}, 
{_id: 6, methodID:1, recorded: 400, timeneeded: 9}

Now I would like to get just the sum of timeneeded of all records that have been recorded in the range of 200 to 350 and grouped by methodID. (That would be 17 for methodID:1 and 5 for methodID:2.)

How can I do that?


I now tried it with a list function that's using WickedGrey's idea. See my functions here:

map function:

function(doc) {  
  emit([ doc.recorded], {methodID:doc.methodID, timeneeded:doc.timeneeded}); 
}

list function:

"function(head, req) {  
  var combined_values = {};
  var row;   
  while (row = getRow()) {  

      if( row.values.methodID in combined_values)     { 
        combined_values[ row.values.methodID] +=row.values.timeneeded; 
      }        
      else {  
        combined_values[ row.values.methodID] = row.values.timeneeded;    
      } 

  } 

  for(var methodID in combined_values){ 
    send( toJSON({method: methodID, timeneeded:combined_values[methodID]}) );
  }   
}"

Now I have to problems:
1. I always get the results as a file and my firefox asks me if I want to download it, instead of viewing it in the browser like when I query a classic view.
2. As I understand the thing, the results are now calculated on the fly, in the list function. I expect this to be not really fast with hundrets of millions of records... Any ideas how to get it faster?

Thank you for your help!
andy

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

心欲静而疯不止 2024-12-02 08:17:49

您不能使用映射键按一组条件进行过滤,而是在 CouchDB 中按另一组条件进行分组。但是,您可以按时间范围过滤键,并使用reduce函数进行分组。尝试这样的操作:

function map(doc) {
    emit(doc.recorded, {doc.methodID: doc.timeneeded});
}

function reduce(key, values, rereduce) {
    var combined_values = {};
    for (var i in values) {
        var totals = values[i];
        for (var methodID in totals) {
            if (methodID in combined_values) {
                combined_values[methodID] += totals[methodID];
            }
            else {
                combined_values[methodID] = totals[methodID];
            }
        }
    }
    return combined_values;
}

这应该允许您指定开始/结束键,并且使用 group_level=0 应该为您提供一个包含您正在查找的字典的值。

编辑:另外,这个线程可能会令人感兴趣:

http: //couchdb-development.1959287.n2.nabble.com/reduce-limit-error-td2789734.html

它讨论了关闭reduce必须收缩消息的选项,并且列表的下方提供了实现相同目标的其他方法:使用列表函数。这可能是比我在这里概述的更好的方法。 :(

You can't use a map key to filter by one set of criteria, but group by another in CouchDB. However, you can filter the keys by time range, and group with a reduce function. Try something like this:

function map(doc) {
    emit(doc.recorded, {doc.methodID: doc.timeneeded});
}

function reduce(key, values, rereduce) {
    var combined_values = {};
    for (var i in values) {
        var totals = values[i];
        for (var methodID in totals) {
            if (methodID in combined_values) {
                combined_values[methodID] += totals[methodID];
            }
            else {
                combined_values[methodID] = totals[methodID];
            }
        }
    }
    return combined_values;
}

That should allow you to specify a start/end key, and with group_level=0 should get you a value containing the dictionary that you're looking for.

Edit: Also, this thread might be of interest:

http://couchdb-development.1959287.n2.nabble.com/reduce-limit-error-td2789734.html

It discusses an option to turn off the reduce must shrink message, and further down the list provides other ways of achieving the same goal: using a list function. That might be a better approach that what I've outlined here. :(

长伴 2024-12-02 08:17:49
function map(doc) {
  if(doc.methodID && doc.recorded && doc.timeneeded) {
    emit([doc.methodID,doc.recorded], doc.timeneeded);
  }
}

//reduce
_sum
function map(doc) {
  if(doc.methodID && doc.recorded && doc.timeneeded) {
    emit([doc.methodID,doc.recorded], doc.timeneeded);
  }
}

//reduce
_sum
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文