如何设计一个架构来有效地查询键值数据库中的嵌套项?
我正在使用 Mnesia 和 Erlang,但这个问题适用于任何键值数据库,如 couchdb 等。
我试图摆脱 RDBMS 思维过程,但我不能 我的注意力集中在如何有效地实现这种模式。
假设我有一个User记录,他有很多SubItemA记录,其中有 许多子项 B 记录,因此:
User
-SubItem A
--SubItem B
...
我需要对子项 B 运行查询。当它 这个嵌套? 我应该将其标准化,这样会更快吗?
我听说有些人使用数据复制,所以数据既是 嵌套和分离,这是荒谬的还是这实际上有用 一些案例?
I'm using Mnesia with Erlang, but this question applies to any key-value db like couchdb, etc.
I'm trying to break free of the RDBMS thought process, but I can't
wrap my head around how to efficiently implement this kind of schema.
Say I have a User record, and he has many SubItemA records, which has
many SubItem B records, so:
User
-SubItem A
--SubItem B
...
I need to run queries on SubItem B. Is it efficient to do it when it's
this nested? Should I just normalize it so it will be quicker?
I have heard of some people using data duplication so the data is both
nested and separate, is this ridiculous or is this actually useful in
some cases?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
根本问题是,什么时候性能足够好?
如果您确实需要详细检查每个子项 B 并且 B 的大小主导字典的整体大小,那么对用户字典进行表扫描并不会产生过多的开销。
如果这还不够好,请将其标准化,这样您就可以避免在查询子项 B 时预先读取所有用户和子项 A 数据。在子项 B 中使用复合键,例如 (UserId、SubItemAId、SubItemBId)字典(如果表是有序的),这样您就可以进行范围查询。
如果这完全影响了您的 User/SubItem A 查询性能,那么请考虑将数据重复作为最后的手段,因为它更容易出错。
The underlying question is, when is the performance good enough?
Table-scanning the User dictionary isn't excessive overhead if you really do need to examine every SubItem B in detail and the size of the B's dominates the overall size of the dictionary.
If that isn't good enough, normalize it so you can avoid reading in all the User and SubItem A data up front when you're querying SubItem B. Use a compound key such as (UserId, SubItemAId, SubItemBId) in the SubItem B dictionary if the table is ordered so you can do range queries.
If that totally kills your User/SubItem A query performance, then consider data duplication as a last resort because it's more error-prone.
在 CouchDb 中,为每个子项发出视图条目是很简单的。 这将使您能够非常快速地访问这些项目。 根据您在视图条目中放入的内容,您可能可以提供链接回父文档/子项目所需的任何信息。
In CouchDb it would be trivial to emit view entries for each of the SubItems. This would give you very fast access to those items. Depending on what you also put in the view entries you could probably provide any information you need for linking back to parent documents / sub items.
我不确定 Mnesia,而且我才刚刚开始使用 CouchDB,但我的理解是,在 CouchDB 中,由于您生成自己的自定义索引(“视图”),因此您可以直接在这些子索引上构建索引项目。
一个示例映射函数:
这实际上是子项 B 的索引列表,然后您可以根据需要从该列表中进行剪切和拼接。
I'm not sure about Mnesia, and I'm only just getting started with CouchDB, but my understanding is that in CouchDB, since you generate your own custom indexes ("views"), you can straightforwardly build an index on those sub-items.
An example map function:
That is effectively an indexed listing of SubItem Bs and then you could cut and splice from that listing as you choose.
实际上,我认为这取决于您使用的数据库。 在 CouchDB 中,一件事会工作得更好,而在 Mnesia 中,另一件事会更好。 您应该对数据进行分区和分片吗? 您应该根据什么标准这样做? 多少数据重复就足够了?
正如 Jeffery Hantin 所说,需要进行一些实验和分析才能找出正确的解决方案。 也就是说,大多数非关系数据库都为您提供了解决问题所需的工具。 你的职责是弄清楚每一项的权衡,以及你可以接受哪些权衡与其他权衡。
Actually it depends on the database you are using I think. In CouchDB one thing will work better while in Mnesia something else would be better. Should you partition and shard the data? On what criteria should you do so? How much data duplication is enough?
As Jeffery Hantin said it's going to take some experimentation and analysis to figure out the right solution. That said most of the non-relational databases out there provide you with the tools you need to solve the problem. Your part is figuring out the trade-offs of each one and which trade-off you can accept versus the others.