使用文档数据库 (noSQL) 进行基于集合的基本操作
与大多数人一样,我来自 RDMS 世界,试图了解 noSQL 数据库,特别是文档存储(因为我发现它们最有趣)。
我试图了解如何使用文档数据库执行一些基于集合的操作(我正在使用 RavenDB)。
因此,根据我的理解:
- Union(如 SQL UNION 中)是非常直接的追加。此外 不同集合之间的并集(SQL JOIN)可以实现map/reduce。这 RavenDB 神话书中给出的示例,评论计数为 博客条目是一个好的开始。
- 可以使用以下多种技术来执行交叉 反规范化一直到创建“映射”或“链接” 此处所述的文档(以及聚合器示例以下)。在 RDMS 中,这将使用简单的“INNER JOIN”或“WHERE x IN”
- 减去(相对补码)来执行,这是我遇到困难的地方。在 RDMS 中,此操作只是“WHERE x NOT IN”或“LEFT JOIN”,其中连接集为 NULL。
使用一个现实世界的例子,假设我们有一个 RSS 聚合器(例如 Google Reader),它有数百万甚至数十亿个 RSS 条目,其中有数千个用户,每个条目都带有收藏夹等。
在这个例子中,我们重点关注条目、用户和标签;其中标签充当用户和条目之间的链接。
user {string id, string name /*etc.*/}
entry {string id, string title, string url /*etc.*/}
tag {string userId, string entryId, string[] tags} /* (favourite, read, etc.)*/
通过上述方法,可以很容易地执行条目和用户使用标签之间的交集。但我无法理解如何执行减法。例如“返回所有没有任何标签的项目”,甚至更令人畏惧的“返回最新的 1000 个没有任何标签的项目”。
所以我的问题是:
- 你能给我一些关于这个问题的阅读材料吗?
- 您能否分享一些关于如何完成任务的想法 高效?
注意:我知道您会失去文档数据库的查询灵活性,但肯定有办法做到这一点吗?
As with most, I come from and RDMS world trying to get my head around noSQL databases and specifically document stores (as I find them the most interesting).
I am try to understand how to perform some set-based operations using a document database (I'm playing with RavenDB).
So as per my understanding:
- Union (as in SQL UNION) is very straight forward append. Additionally
unions between different sets (SQL JOIN) can be achieved map/reduce. The
example given in the RavenDB mythology book with Comment counts on
Blogs entries is a good start. - Intersection can be performed using a number of techniques from
de-normalization right through to creating a “mapping” or “link”
document as described here (and the aggregator example below). In an RDMS this would be performed using a simple "INNER JOIN" or "WHERE x IN" - Subtract (Relative Complement) is where I am getting stuck. In an RDMS this operation is simply a "WHERE x NOT IN" or a "LEFT JOIN" where the joined set is NULL.
Using a real world example let’s say we have an RSS aggregator (such as Google Reader) which has millions if not billions of RSS entries with thousands of users, each tagging favourite, etc.
In this example we focus on entry, user and tag; where tag acts as a link between user and entry.
user {string id, string name /*etc.*/}
entry {string id, string title, string url /*etc.*/}
tag {string userId, string entryId, string[] tags} /* (favourite, read, etc.)*/
With the above approach it is easy to perform the intersection between entry and user using tag. But I cannot get my head around how one would perform a subtract. For instance “Return all items that do not have any tags” or even more daunting “return the latest 1000 items without any tag”.
So my question:
- Can you point me to some reading material on the matter?
- Can you share some ideas on how one can accomplish the task
efficiently?
Note: I know that you lose query flexibility with document databases, but surely there must be a way to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
阿莫克,
您想要的东西在非关系数据库中确实无法轻松完成。
主要是因为他们不进行集合思考,并且与分布式计算有很强的联系。
例如,如果无法访问所有数据,您就无法真正进行有效的集合,这几乎意味着任何基于集合的操作都必须需要访问所有这些数据。
由于NoSQL数据库通常用于分布式场景,因此它们无法真正支持这一点。
具体来说,RavenDB 允许对指定集合进行某些操作,但它是建立在独立文档的假设之上的,这些文档与其他文档或需要以相同方式一起操作的文档没有很强的关系。
Amok,
What you want cannot really be done easily in non relational databases.
Mostly because they don't think in sets and have strong ties to distributed computing.
You can't really do efficient sets without having access to all the data, for example, and that pretty much means that any set based operation is going to have to need access to all of that.
Since NoSQL dbs are usually used in distributed scenarios, they can't really support that.
RavenDB, specifically, allows some operations on a specified set, but it is built strongly on the assumption of independent documents, that don't have strong relations to other documents, or documents that need to be manipulated all together in the same fashion.
从 RDBMS 到文档数据库的过渡并不完全顺利,可能需要对模型进行一些重构才能使其达到最佳状态。这是由于这些技术的不同性质造成的。
关于。 RavenDB 中基于集合的操作,请参阅:
http://ayende.com/ blog/4535/set-based-operations-with-ravendb
http://ravendb.net/documentation/set-based
Transition from RDBMS to a document database isn't completely smooth, and some refactoring to your Model may be necessary to make it optimal. This is due to the different natures of those technologies.
Re. set-based operations in RavenDB, see:
http://ayende.com/blog/4535/set-based-operations-with-ravendb
http://ravendb.net/documentation/set-based