对缩小视图的结果重新排序
考虑以下文档结构:
主题:
- doc_type 1
- _id
- subject (string)
帖子:
- doc_type 2
- _id
- thread_id (_id of Thread)
- time (milliseconds since 1970)
- comment (string)
我需要按主题上的最后一个帖子排序的主题,以及最新的 5 个帖子。 我认为避免每次完成新帖子时都更新线程文档,以消除跨数据库节点的分布式环境中发生冲突的可能性。此外,它将为数据库工作,而数据库应该为你工作。
为简单起见 - 让我们从查找最新帖子开始。 5个帖子可以用同样的方式收集。
现在,我不确定我的方向是否正确,但是,查看 这里我找到了如何使用reduce函数查找线程中的最后一篇文章,该函数使用组级别返回从文档类型1获取的线程主题,以及从文档类型2获取的最后一篇文章文档。
顺便说一句- 与链接中的示例相反,在我的情况下,线程始终是通过第一篇文章创建的(因此,例如,线程的创建日期将是其第一篇文章的日期)。
地图:
function(doc){
switch(doc.doc_type){
case 1: emit([doc._id],doc); return;
case 2: emit([doc.thread_id],doc); return;
}
}
缩小: 现实世界中的按键更加复合,因此必须与适当的组级别一起使用。 为了简单起见,我在这里也忽略了重新归约的情况。 您可以在此处找到完整图片:
function(keys, vals, rr){
var result = { subject: null, lastPost: null, count :0 };
//I'll ignore the re-reduce case for simplicity
vals.forEach(function(doc){
switch(doc.doc_type){
case 1:
result.subject = doc.subject;
return;
case 2:
if (result.lastPost.time < doc.time) result.lastPost = doc;
result.count++;
return;
}
});
return result;
}
但是我该如何页面然后按最新发布日期排序? 有没有一种方法可以从查询结果中提供 doc-ids 作为另一个查询的过滤条件(最好是使用一次往返)?
一个线程中的帖子数量没有限制,所以我有点不愿意在这里转发列表功能,当页面大小也可以变化时,什么会导致最后一个帖子根本不显示。
consider the following document structures:
Thread:
- doc_type 1
- _id
- subject (string)
Posts:
- doc_type 2
- _id
- thread_id (_id of Thread)
- time (milliseconds since 1970)
- comment (string)
I need the threads sorted by the last post on a thread, together with latest 5 posts.
I thought to avoid updating the thread document every time a new post is done in order to eliminate probability of conflicts in a distributed environment across db nodes. Besides, it will be working for the DB where the DB should be working for you.
For simplicity - lets' just start with finding the latest post. The 5 posts can be gathered the same way.
Now, I'm not sure I'm on the right direction, however, looking here I found how to find the last post in a thread using a reduce function that uses a group-level to return thread subject taken from doc-type 1, and the last post document taken from doc-type 2.
BTW - opposed to the sample in the link, in my case a thread is always created with a first post, (so, for example, the creation date of a Thread will be the date of it's first Post).
map:
function(doc){
switch(doc.doc_type){
case 1: emit([doc._id],doc); return;
case 2: emit([doc.thread_id],doc); return;
}
}
reduce:
on real world keys are more compound, so it must be used with appropriate group-level.
I also ignore here the case of re-reduce, just for simplicity's sake.
You can find full picture here:
function(keys, vals, rr){
var result = { subject: null, lastPost: null, count :0 };
//I'll ignore the re-reduce case for simplicity
vals.forEach(function(doc){
switch(doc.doc_type){
case 1:
result.subject = doc.subject;
return;
case 2:
if (result.lastPost.time < doc.time) result.lastPost = doc;
result.count++;
return;
}
});
return result;
}
But how do I page it afterwards sorted by the latest-post date?
Is there a way to feed doc-ids from a result of a query as the filter criteria of another (preferably, using one round-trip)?
There is no limit to the number of posts in a thread, so I'm a little reluctant to relay on list function here, when the page-size can also vary, what will result in the last post not showing at all.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您只看过最后一篇文章或最后五篇文章,那么有一种更简单的方法。事实上,你可以完全避免使用减速器。
如果将时间添加为键的第二部分,则可以使用 endkey、降序和限制的组合来根据 thread_id 获取最后 N 个帖子。
这是我根据您的架构使用一些测试数据编写的 MapReduce:
“Z”键的奇怪输出是允许您从项目列表的“底部”获取主题。
查询参数类似于:
限制应该是 N+1,其中 N 是您想要返回的帖子数。在结果中,您将从帖子文档中获得线程主题和 _id 对象(或您想要的任何内容)。
此示例中输出了 _id 对象,因此如果您想要完整的帖子,可以将其与
include_docs=true
一起使用。加入您想要的帖子文档中的任何其他数据(标题等)以保持较低的整体索引大小,并在需要文档完整内容的地方使用 include_docs 。但是,如果您始终需要完整的发布文档,请将其输出到发射中,因为这将为您提供更快的响应(尽管磁盘上的索引大小较大)。另外,如果您需要按最后一篇文章排序的所有线程的列表以及每个线程 5 个帖子,则需要输出诸如
[time, thread_id, 'thread']
和之类的键[time, thread_id, 'post']
并使用_list
收集每个线程文档“下方”的帖子,因为时间排序会导致结果中的线程和帖子相距较远。然后可以使用_list
函数再次组合/查找它们。然而,执行两个请求可能仍然更容易/更轻松。If you're only after the last post or the last five posts, there's a much simpler method. You can completely avoid the reducer, in fact.
If you add the time as the second portion of the key, you can use a combination of endkey, descending, and limit to get the last N posts based on the thread_id.
Here's the MapReduce I wrote with some test data based on your schemas:
The strange output of the 'Z' key is to allow you to get the subject from the "bottom" of the list of items.
The query parameters would look something like:
The limit should be N+1 where N is the number of posts you'd like back. In the results you'll have the thread subject and _id objects (or whatever you'd like) from the post documents.
The _id objects are output in this example so you can use it with
include_docs=true
if you want the full post. Toss in whatever other data from the post document you want (title, etc) to keep the overall index size low and use include_docs in those places where you need the full contents of the document. However, if you always need the full post document, output it in the emit as that will give you a faster response (though a larger index size on disk).Also, if you need a list of all threads sorted by last post as well as 5 posts per thread, you'd need to output keys like
[time, thread_id, 'thread']
and[time, thread_id, 'post']
and use a_list
to collect the posts "under" each thread document as the time sorting will cause threads and posts to be farther apart in the results. A_list
function can then be used to combine/find them again. However, doing two requests may still be easier/lighter.