使用 NOSQL 进行连接操作
我阅读了一些有关 Bigtable 和 NOSQL 的文章。非常有趣的是,他们避免了 JOIN 操作。
作为一个基本示例,我们以员工和部门表为例,并假设数据分布在多个表/服务器上。
只是想知道,如果数据分布在多个服务器上,我们如何进行JOIN或UNION操作?
I have gone through some articles regarding Bigtable and NOSQL. It is very interesting that they avoid JOIN operations.
As a basic example, let's take Employee and Department table and assume the data is spread across multiple tables / servers.
Just want to know, if data is spread across multiple servers, how do we do JOIN or UNION operations?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
当您拥有非常大的数据时,您可能希望避免连接。这是因为单个键查找的开销比较大(服务需要找出要查询的节点,并并行查询并等待响应)。我所说的开销是指延迟,而不是吞吐量限制。
这使得连接非常糟糕,因为您需要进行大量外键查找,这最终会到达许多不同的节点(在许多情况下)。所以你应该避免这种模式。
如果这种情况不经常发生,您可能会受到影响,但如果您想要做很多这样的事情,那么可能值得对数据进行“非规范化”。
首先,存储在 NoSQL 存储中的内容通常是相当“异常”的。在各种不同的地方复制相同的数据以使查找更容易的情况并不罕见。
此外,大多数 nosql 也不(真正)支持二级索引,这意味着如果您想按任何其他条件进行查询,则必须重复内容。
如果您要存储员工和部门等数据,那么使用传统数据库确实会更好。
When you have extremely large data, you probably want to avoid joins. This is because the overhead of an individual key lookup is relatively large (the service needs to figure out which node(s) to query, and query them in parallel and wait for responses). By overhead, I mean latency, not throughput limitation.
This makes joins suck really badly as you'd need to do a lot of foreign key lookups, which would end up going to many,many different nodes (in many cases). So you'd want to avoid this as a pattern.
If it doesn't happen very often, you could probably take the hit, but if you're going to want to do a lot of them, it may be worth "denormalising" the data.
The kind of stuff which gets stored in NoSQL stores is typically pretty "abnormal" in the first place. It is not uncommon to duplicate the same data in all sorts of different places to make lookups easier.
Additionally most nosql don't (really) support secondary indexes either, which means you have to duplicate stuff if you want to query by any other criterion.
If you're storing data such as employees and departments, you're really better off with a conventional database.
您必须进行多项选择,并在应用程序中手动加入数据。有关详细信息,请参阅这篇帖子。从那篇文章:
You would have to do multiple selects, and join the data manually in your application. See this SO post for more information. From that post:
遗憾的是,本机无法对 NoSQL 数据库执行联接。这实际上是 SQL 和 NoSQL 数据库之间最大的区别之一。
正如@kaleb 所说,您必须进行多项选择,然后“手动”加入所需的信息。
幸运的是,有 ORM 框架(例如 Prisma)允许您“伪造”本机 SQL 连接功能。
注意:您仍在幕后执行多个数据库调用,增加读取操作以及相关的所有内容。
“ Prisma Client 的一个关键功能是能够查询两个或多个模型之间的关系。” -> https://www.prisma.io/
示例:
在本例中,帖子存储在不同的位置表,但 Prisma 能够获取它们并将它们连接到 User 对象中。
Natively, unfortunately, is not possible to perform a Join into a NoSQL database. This is actually one of the biggest differences between SQL and NoSQL DBs.
As @kaleb said, you would have to do multiple selections and then join the needed information "manually".
Luckily, there are ORMs frameworks such as Prisma that will allow you to "fake" the native SQL join feature.
Note: you're still performing multiple db calls under the hood, increasing the read-ops, and everything that's concerned.
" A key feature of Prisma Client is the ability to query relations between two or more models. " -> https://www.prisma.io/
example:
In this case, the posts are stored in a different table, but Prisma is able to fetch them and join them into the User object.
卡莱布是对的。如果您的数据不能很好地适合键值存储,您可以使用 NoSQL 解决方案编写自定义代码。 Map-reduce/异步处理和自定义视图缓存很常见。 Brian Aker 在 2009 年 11 月的 OpenSQLCamp 上做了一个非常有趣(讽刺且带有偏见)的演讲 http:// www.youtube.com/watch?v=LhnGarRsKnA。跳过 40 秒了解连接。
Kaleb's right. You write custom code with a NoSQL solution if your data doesn't fit well into a key-value store. Map-reduce/async processing and custom view caches are common. Brian Aker gave a very funny (and satirical and biased) presentation at the Nov 2009 OpenSQLCamp http://www.youtube.com/watch?v=LhnGarRsKnA. Skip in 40 seconds to hear about joins.