Couchdb map/reduce 返回流中的第一个，然后按时间排序

发布于 2024-11-03 14:15:51 字数 2104 浏览 4 评论 0原文

我有一个 couchdb，它保存一系列事件。每个事件都有一个所有者、一个 ID、发生的时间和一条消息（加上一堆与本练习无关的其他内容）。我想要一份最近发生的事件列表（按时间排序）。我查看了这个问题 CouchDB - 过滤每个记录的最新日志列表中的实例并尝试将其与翻转的减速器中的比较一起使用以保留第一条消息（使用我有复杂键的表单）。

不幸的是，它似乎并没有完全达到想要的效果。

这是我的地图函数

function(doc) {
  var owner, id;
  if (doc.owner
      && doc.stream_id
      && doc.message
      && doc.receipt_time)
    {
      emit([doc.owner,doc.stream_id,doc.receipt_time],
           { owner: doc.owner,
             stream_id: doc.stream_id,
             timestamp: doc.receipt_time,
             message: doc.message
           });
    }
}

和归约函数

function(keys, values) {
  var challenger, winner = null;
  for (var a = 0; a < values.length; a++) {
      challenger = values[a];
      if (! winner) {
        winner = challenger;
      } else {
        if (winner.owner !== challenger.owner
            && winner.trace_id !== challenger.trace_id ) {
          return null;
        } else if (challenger.timestamp < winner.timestamp) {
          winner = challenger;
        }
      }
    }
  return winner;
}

然后我用 ?descending=true&group=true&group_level=2 调用以从每个流中获取第一条消息，但是，它似乎不是按时间排序的，而是按所有者和Stream_id，像这样

{"rows":[
  {"key":["sam","a"],
   "value":
     {"owner":"sam","stream_id":"a","timestamp":1303754236482,"message":"foo"}
  },
  {"key":["sam","b"],
   "value":
     {"owner":"sam","stream_id":"b","timestamp":1303752578476,"message":"bar"}
  },
  {"key":["jim","j1"],
   "value":
     {"owner":"jim","stream_id":"j1","timestamp":1303625378839,"message":"stuff"}
  },
  {"key":["bob","loblaw"],
   "value":
     {"owner":"bob","stream_id":"loblaw","timestamp":1303328396532,"message":"more stuff"}
  },
  {"key":["anthony","foo"],
   "value":
     {"owner":"anthony","stream_id":"foo","timestamp":1303769699444,"message":"even more"}
  }
]}

（注意最后的条目实际上是最新的）。

所以我希望最终的视图是现在的样子，但按时间排序。有办法做到这一点吗？

原文

I have a couchdb which holds a series of events. Each event has an owner, an id, a time it occured and a message (plus a bunch of other stuff which doesn't matter for this exercise). I'd like a list of events which occured recently ordered by time. I looked through this question CouchDB - filter latest log per logged instance from a list and tried using it with the comparison in the reducer flipped to keep the first message (using the form where I have a complex key).

Unfortunately it doesn't quite seem to do what want.

Here's my map function

function(doc) {
  var owner, id;
  if (doc.owner
      && doc.stream_id
      && doc.message
      && doc.receipt_time)
    {
      emit([doc.owner,doc.stream_id,doc.receipt_time],
           { owner: doc.owner,
             stream_id: doc.stream_id,
             timestamp: doc.receipt_time,
             message: doc.message
           });
    }
}

and my reduce function

function(keys, values) {
  var challenger, winner = null;
  for (var a = 0; a < values.length; a++) {
      challenger = values[a];
      if (! winner) {
        winner = challenger;
      } else {
        if (winner.owner !== challenger.owner
            && winner.trace_id !== challenger.trace_id ) {
          return null;
        } else if (challenger.timestamp < winner.timestamp) {
          winner = challenger;
        }
      }
    }
  return winner;
}

Then I invoke with ?descending=true&group=true&group_level=2 to get the first message from each stream, however, it doesn't seem to be ordered by time, but by owner and stream_id, like this

{"rows":[
  {"key":["sam","a"],
   "value":
     {"owner":"sam","stream_id":"a","timestamp":1303754236482,"message":"foo"}
  },
  {"key":["sam","b"],
   "value":
     {"owner":"sam","stream_id":"b","timestamp":1303752578476,"message":"bar"}
  },
  {"key":["jim","j1"],
   "value":
     {"owner":"jim","stream_id":"j1","timestamp":1303625378839,"message":"stuff"}
  },
  {"key":["bob","loblaw"],
   "value":
     {"owner":"bob","stream_id":"loblaw","timestamp":1303328396532,"message":"more stuff"}
  },
  {"key":["anthony","foo"],
   "value":
     {"owner":"anthony","stream_id":"foo","timestamp":1303769699444,"message":"even more"}
  }
]}

(Notice the final entry actually is the most recent in time).

So what I'd like is for the final view to be what it is now but ordered by time. Is there a way to do this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

雾里花 2024-11-10 14:15:58

如果我理解正确的话，您并不是要过滤事件集合，而只是对它们进行排序。假设这是正确的，解决方案实际上非常简单，您甚至不需要归约函数。地图函数中发出的键用于对视图进行排序，首先按键中的第一个键，然后处理其余部分。换句话说，如果您想按stream_id然后receipt_time排序，您对emit的调用将如下所示：

emit([doc.stream_id,doc.receipt_time,doc.owner], doc.message);

当然，如果您想按receipt_time然后stream_id排序，则键将是 [doc.receipt_time,doc .stream_id,doc.owner]。我认为没有必要在键中已经存在的值中包含任何内容，这就是为什么我将值缩减为仅包含消息。

If I understand you correctly, you're not looking to filter the collection of events, but just order them. Assuming that's correct, the solution is actually pretty simple and you don't even need a reduce function. The keys that are emitted in your map function are used to sort the view, first by what's first in the key, then working the rest of the way through it. In other words, if you want to sort by stream_id then receipt_time, your call to emit would look like this:

emit([doc.stream_id,doc.receipt_time,doc.owner], doc.message);

Naturally, if you instead want to sort by receipt_time then stream_id, the key would instead be [doc.receipt_time,doc.stream_id,doc.owner]. I don't think there's any need to include anything in the value that's already present in the key, which is why I trimmed the value down to just the message.

回复收藏 0 原文

丧 2024-11-10 14:15:58

嗯，我认为最简单的实际上就是避免这个问题。

由于我控制发送事件的软件，因此我刚刚向流中的第一个文档添加了一个 "start":true, 字段，然后视图函数仅发出具有该值的事件。

这意味着我无法获取历史数据，但这没关系，因为这主要用于检查最近的流。

我尝试的另一种选择是添加一个列表函数，该函数在键为 [timestamp,owner,stream_id] 的视图上发送每个所有者stream_id的第一个实例，但是，这会遇到一个问题，即当您限制它时，它不会限制最终渲染的列表，但原始视图，因此额外的键到目前为止效果最好）。

我仍然想知道是否有某种方法可以用原始数据来做到这一点。

回复收藏 0 原文

羁〃客ぐ 2024-11-10 14:15:57

在每条消息中存储stream_created_at时间戳。因此，对于第一条消息，您需要使用当前时间。对于流中的每一条消息，您都从前一条消息中复制它（为此创建一个视图以获取stream_created_at_by_stream_id）。

然后创建发出以下消息的视图：

[doc.owner,doc.stream_created_at, doc.stream_id, doc.receipt_time]

它将来自同一流的消息分组在一起，同时保留时间顺序。 Stream.id 将确保同时创建两个流时来自不同流的消息不会混淆。 receive_time 会按时间对流中的消息进行排序。

所以最终你会得到类似 Facebook 的对话。而且你根本不需要任何reduce 函数。

Store stream_created_at timestamp in every message. So for the first message you take current time. For every next message in the stream you copy it from the previous one (create a view to get stream_created_at_by_stream_id for this).

Then create view that emits:

[doc.owner,doc.stream_created_at, doc.stream_id, doc.receipt_time]

That will group messages from the same stream together while preserving time ordering. stream.id will ensure that messages from different streams don't mix up when two streams are created at the same time. And receipt_time will order the messages in the stream by time.

So in the end you will get Facebook like conversations. And you don't need any reduce function at all.

回复收藏 0 原文

~没有更多了~