SQL 或 NoSQL 数据库中存在大量小型树结构
我想在数据库中存储许多不同树中的节点信息。
首先,500 棵树将共享超过 20000 个节点,每个节点将有 5 个数字属性。一旦构造好,每个节点都需要引用它的所有直接子节点,而不需要引用其他节点。
我需要在初始化时在内存中构建所有树,并在程序进入停机时间后更新/添加节点(可能每小时左右,尽管越多越好)。
我看过sql邻接模型,它看起来构建每个表需要太长时间(必须进行太多的数据库调用),嵌套集模型是一种可能性,但扩展树更复杂,这是会发生很多事情,并且它增加了数据库的复杂性,因为我认为这可能是一个非常基本的结构和查询集。
我也研究过 MongoDb,但它似乎更适合 JSON 类型对象,而且我正在使用 java,并且可能会过度杀戮,而且 HBase 也绝对会过度杀戮(优点是,如果节点数量变得巨大,它可能会出现)有用,这是未来的可能性,我可以增加数据库的写入时间,这也是一个优势)
有人对我如何解决这个问题有任何建议吗?
NoSql 数据库是否杀伤力过大?他们在存储树结构方面做得更好吗?将它们与 sql 数据库一起使用是一种不好的做法吗?
I would like to store information on nodes in lots of different trees in a database.
To begin with there will be over 20000 nodes shared between 500 trees, each node will have 5 number attributes. once constructed each node needs reference to all it's immediate children and no other nodes.
I require to build all trees in memory at initialization time and update/add nodes once the program enters downtime (perhaps every hour or so, although the more the better).
I have looked at the sql adjacency model which seems like it would take too long to construct each table (having to make too many db calls), the nested set model which is a possibility but is more complex to expand the tree which is something which will happen a lot and it increases database complexity for what I see as could be a very basic structure and query set.
I have also looked in to MongoDb but it seems more geared towards JSON type objects and I am using java, and might be over kill, and also HBase which definetly looks over kill (the advantage being if the amount of nodes becomes huge it might come in useful which is a possibility for the future and I could increase write time to the DB which would be an advantage too)
Anyone have any suggestions of how I might go about this?
Are the NoSql dbs overkill? are they much better at storing tree structures? is it poor practice to use them along side sql databases?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您删除嵌套集上的
(rgt - lft - 1) / 2
returns number of Children 属性,并为 lft/rgt 列使用浮点数,则可以在最短的时间内插入/更新/删除节点。这样做的主要问题是避免与精度相关的问题。您可以通过将 lft/rgt 转换为数字并返回浮点来解决后者,以获得它们的规范表示。以 Postgres 为例:
另一个问题相当容易管理,并且在空间不足时发生:然后您偶尔需要重新索引部分或全部树 - 它需要锁定树,但速度足够快,您可以这样做不影响正常运行。
If you drop the
(rgt - lft - 1) / 2
yields number of children property on nested sets, and use floats for the lft/rgt columns, you can insert/update/delete nodes in minimal time.The main issue when doing so, is to avoid precision-related issues. You can work around the latter by casting to lft/rgt to numeric and back to float, so as to obtain their canonical representation. Example with Postgres:
The other issue is reasonably easy to manage and occurs when you run out of space: you then occasionally need to re-index part or all of tree -- it requires locking the tree but it's fast enough that you can do so without impacting normal operations.
如果您使用的是 SQL Server 2008+,则可以使用新的 HierarchyID适用于此类场景的数据类型。
if you're using SQL Server 2008+ you can use the new HierarchyID data type meant for such scenarios.