为了使用 CouchDB 监视应用程序,我需要总结数据的一个字段(例如执行已记录的方法所需的时间)。
对于map-reduce来说这对我来说没有问题,但我只需要对特殊时间片中记录的数据进行求和。
示例记录:
{_id: 1, methodID:1, recorded: 100, timeneeded: 10},
{_id: 2, methodID:1, recorded: 200, timeneeded: 11},
{_id: 3, methodID:2, recorded: 200, timeneeded: 2},
{_id: 4, methodID:1, recorded: 300, timeneeded: 6},
{_id: 5, methodID:2, recorded: 310, timeneeded: 3},
{_id: 6, methodID:1, recorded: 400, timeneeded: 9}
现在我想获取已记录
的所有记录的timeneeded
总和,范围在200到350之间,并按methodID<分组/代码>。 (methodID:1
为 17,methodID:2
为 5。)
我该怎么做?
我现在尝试使用使用 WickedGrey 想法的列表函数。在这里查看我的功能:
地图功能:
function(doc) {
emit([ doc.recorded], {methodID:doc.methodID, timeneeded:doc.timeneeded});
}
列表功能:
"function(head, req) {
var combined_values = {};
var row;
while (row = getRow()) {
if( row.values.methodID in combined_values) {
combined_values[ row.values.methodID] +=row.values.timeneeded;
}
else {
combined_values[ row.values.methodID] = row.values.timeneeded;
}
}
for(var methodID in combined_values){
send( toJSON({method: methodID, timeneeded:combined_values[methodID]}) );
}
}"
现在我遇到问题:
1. 我总是以文件形式获取结果,我的 Firefox 会询问我是否要下载它,而不是像查询经典视图那样在浏览器中查看它。
2. 据我了解,结果现在是在列表函数中动态计算的。我预计这对于数亿条记录来说不会很快......有什么想法可以让它更快吗?
感谢您的帮助!
安迪
For a monitoring an application with CouchDB I need to sum up a field of my data (for example the time needed to execute a method that has been logged).
That's no problem for me with map-reduce, but I need to sum up only the data recorded in a special time slice.
Example records:
{_id: 1, methodID:1, recorded: 100, timeneeded: 10},
{_id: 2, methodID:1, recorded: 200, timeneeded: 11},
{_id: 3, methodID:2, recorded: 200, timeneeded: 2},
{_id: 4, methodID:1, recorded: 300, timeneeded: 6},
{_id: 5, methodID:2, recorded: 310, timeneeded: 3},
{_id: 6, methodID:1, recorded: 400, timeneeded: 9}
Now I would like to get just the sum of timeneeded
of all records that have been recorded
in the range of 200 to 350 and grouped by methodID
. (That would be 17 for methodID:1
and 5 for methodID:2
.)
How can I do that?
I now tried it with a list function that's using WickedGrey's idea. See my functions here:
map function:
function(doc) {
emit([ doc.recorded], {methodID:doc.methodID, timeneeded:doc.timeneeded});
}
list function:
"function(head, req) {
var combined_values = {};
var row;
while (row = getRow()) {
if( row.values.methodID in combined_values) {
combined_values[ row.values.methodID] +=row.values.timeneeded;
}
else {
combined_values[ row.values.methodID] = row.values.timeneeded;
}
}
for(var methodID in combined_values){
send( toJSON({method: methodID, timeneeded:combined_values[methodID]}) );
}
}"
Now I have to problems:
1. I always get the results as a file and my firefox asks me if I want to download it, instead of viewing it in the browser like when I query a classic view.
2. As I understand the thing, the results are now calculated on the fly, in the list function. I expect this to be not really fast with hundrets of millions of records... Any ideas how to get it faster?
Thank you for your help!
andy
发布评论
评论(2)
您不能使用映射键按一组条件进行过滤,而是在 CouchDB 中按另一组条件进行分组。但是,您可以按时间范围过滤键,并使用reduce函数进行分组。尝试这样的操作:
这应该允许您指定开始/结束键,并且使用 group_level=0 应该为您提供一个包含您正在查找的字典的值。
编辑:另外,这个线程可能会令人感兴趣:
http: //couchdb-development.1959287.n2.nabble.com/reduce-limit-error-td2789734.html
它讨论了关闭reduce必须收缩消息的选项,并且列表的下方提供了实现相同目标的其他方法:使用列表函数。这可能是比我在这里概述的更好的方法。 :(
You can't use a map key to filter by one set of criteria, but group by another in CouchDB. However, you can filter the keys by time range, and group with a reduce function. Try something like this:
That should allow you to specify a start/end key, and with group_level=0 should get you a value containing the dictionary that you're looking for.
Edit: Also, this thread might be of interest:
http://couchdb-development.1959287.n2.nabble.com/reduce-limit-error-td2789734.html
It discusses an option to turn off the reduce must shrink message, and further down the list provides other ways of achieving the same goal: using a list function. That might be a better approach that what I've outlined here. :(