CouchDB 函数以给定的时间间隔对记录进行采样。

发布于 2024-11-08 15:06:29 字数 1169 浏览 1 评论 0原文

我有带有时间值的记录,需要能够查询它们的一段时间并仅返回给定时间间隔的记录。

例如,我可能需要以 10 分钟为间隔从 12:00 到 1:00 的所有记录,即 12:00、12:10、12:20、12:30、... 12:50、01:00。间隔需要是一个参数,可以是任何时间值。 15分47秒1.4小时。

我试图通过某种减少来做到这一点,但这显然是错误的地方。

这是我的想法。欢迎评论。

创建了时间字段的视图,以便我可以查询时间范围。视图输出 id 和时间。

function(doc) { 
  emit([doc.rec_id, doc.time], [doc._id, doc.time]) 
}

然后我创建了一个列表函数,它接受一个名为间隔的参数。在列表函数中,我遍历行并将当前行时间与上次接受的时间进行比较。如果跨度大于或等于间隔,我会将该行添加到输出中并对其进行 JSON 化。

function(head, req) { 

  // default to 30000ms or 30 seconds.
  var interval = 30000; 

  // get the interval from the request.
  if (req.query.interval) {
    interval = req.query.interval; 
  }

  // setup
  var row; 
  var rows = []; 
  var lastTime = 0; 

  // go thru the results...
  while (row = getRow()) { 
      // if the time from view is more than the interval 
      // from our last time then add it.
      if (row.value[1] - lastTime > interval) { 
          lastTime = row.value[1]; 
          rows.push(row); 
      } 
  } 
  // JSON-ify!
  send(JSON.stringify({'rows' : rows}));
}

到目前为止,这一切运作良好。我会针对一些大数据进行测试,看看性能如何。关于如何做得更好或者这是否是沙发的正确方法有什么评论吗?

I have records with a time value and need to be able to query them for a span of time and return only records at a given interval.

For example I may need all the records from 12:00 to 1:00 in 10 minute intervals giving me 12:00, 12:10, 12:20, 12:30, ... 12:50, 01:00. The interval needs to be a parameter and it may be any time value. 15 minutes, 47 seconds, 1.4 hours.

I attempted to do this doing some kind of reduce but that is apparently the wrong place to do it.

Here is what I have come up with. Comments are welcome.

Created a view for the time field so I can query a range of times. The view outputs the id and the time.

function(doc) { 
  emit([doc.rec_id, doc.time], [doc._id, doc.time]) 
}

Then I created a list function that accepts a param called interval. In the list function I work thru the rows and compare the current rows time to the last accepted time. If the span is greater or equal to the interval I add the row to the output and JSON-ify it.

function(head, req) { 

  // default to 30000ms or 30 seconds.
  var interval = 30000; 

  // get the interval from the request.
  if (req.query.interval) {
    interval = req.query.interval; 
  }

  // setup
  var row; 
  var rows = []; 
  var lastTime = 0; 

  // go thru the results...
  while (row = getRow()) { 
      // if the time from view is more than the interval 
      // from our last time then add it.
      if (row.value[1] - lastTime > interval) { 
          lastTime = row.value[1]; 
          rows.push(row); 
      } 
  } 
  // JSON-ify!
  send(JSON.stringify({'rows' : rows}));
}

So far this is working well. I will test against some large data to see how the performance is. Any comments on how this could be done better or would this be the correct way with couch?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

最冷一天 2024-11-15 15:06:29

CouchDB 很轻松。如果这对你有用,那么我会说坚持下去并专注于你的下一个首要任务。

一种快速优化是尝试不要在 _list 函数中构建最终答案,而是在 send() 中构建您所知道的一小部分答案。这样,您的函数就可以在无限的结果大小上运行。

但是,正如您所怀疑的那样,您基本上使用 _list 函数来执行临时查询,随着数据库大小的增长,这可能会出现问题。

我不能 100% 确定您需要什么,但如果您要在某个时间范围内查找文档,那么 emit() 键很可能主要应按时间排序。 (在您的示例中,主要(最左侧)排序值为 doc.rec_id。)

对于地图函数:

function(doc) {
  var key = doc.time; // Just sort everything by timestamp.
  emit(key, [doc._id, doc.time]);
}

这将构建所有文档的地图,按时间排序时间戳。 (我假设时间值类似于 JSON.stringify(new Date),即 "2011-05-20T00:34:20.847Z"

要查找其中的所有文档,1小时间隔,只需查询地图视图即可?startkey="2011-05-20T00:00:00.000Z"&endkey="2011-05-20T01:00:00.000Z"

如果我正确理解你的“间隔”标准,那么如果您需要 10 分钟的间隔,那么如果您有 00:00、00:15、00:30, 00:45、00:50,那么最终结果中应该只有 00:00、00:30、00:50。因此,您将过滤正常的沙发输出以删除不需要的结果。对于 _list 函数来说,这是一个完美的工作,只需使用 req.query.interval 并仅 send() 匹配间隔的行。

CouchDB is relaxed. If this is working for you, then I'd say stick with it and focus on your next top priority.

One quick optimization is to try not to build up a final answer in the _list function, but rather send() little pieces of the answer as you know them. That way, your function can run on an unlimited result size.

However, as you suspected, you are using a _list function basically to do an ad-hoc query which could be problematic as your database size grows.

I'm not 100% sure what you need, but if you are looking for documents within a time frame, there's a good chance that emit() keys should primarily sort by time. (In your example, the primary (leftmost) sort value is doc.rec_id.)

For a map function:

function(doc) {
  var key = doc.time; // Just sort everything by timestamp.
  emit(key, [doc._id, doc.time]);
}

That will build a map of all documents, ordered by the time timestamp. (I will assume the time value is like JSON.stringify(new Date), i.e. "2011-05-20T00:34:20.847Z".

To find all documents within, a 1-hour interval, just query the map view with ?startkey="2011-05-20T00:00:00.000Z"&endkey="2011-05-20T01:00:00.000Z".

If I understand your "interval" criteria correctly, then if you need 10-minute intervals, then if you had 00:00, 00:15, 00:30, 00:45, 00:50, then only 00:00, 00:30, 00:50 should be in the final result. Therefore, you are filtering the normal couch output to cut out unwanted results. That is a perfect job for a _list function. Simply use req.query.interval and only send() the rows that match the interval.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文