SQL 或 NoSQL 数据库中存在大量小型树结构

发布于 2024-11-10 09:45:26 字数 527 浏览 3 评论 0原文

我想在数据库中存储许多不同树中的节点信息。

首先,500 棵树将共享超过 20000 个节点,每个节点将有 5 个数字属性。一旦构造好,每个节点都需要引用它的所有直接子节点,而不需要引用其他节点。

我需要在初始化时在内存中构建所有树,并在程序进入停机时间后更新/添加节点(可能每小时左右,尽管越多越好)。

我看过sql邻接模型,它看起来构建每个表需要太长时间(必须进行太多的数据库调用),嵌套集模型是一种可能性,但扩展树更复杂,这是会发生很多事情,并且它增加了数据库的复杂性,因为我认为这可能是一个非常基本的结构和查询集。

我也研究过 MongoDb,但它似乎更适合 JSON 类型对象,而且我正在使用 java,并且可能会过度杀戮,而且 HBase 也绝对会过度杀戮(优点是,如果节点数量变得巨大,它可能会出现)有用,这是未来的可能性,我可以增加数据库的写入时间,这也是一个优势)

有人对我如何解决这个问题有任何建议吗?

NoSql 数据库是否杀伤力过大?他们在存储树结构方面做得更好吗?将它们与 sql 数据库一起使用是一种不好的做法吗?

I would like to store information on nodes in lots of different trees in a database.

To begin with there will be over 20000 nodes shared between 500 trees, each node will have 5 number attributes. once constructed each node needs reference to all it's immediate children and no other nodes.

I require to build all trees in memory at initialization time and update/add nodes once the program enters downtime (perhaps every hour or so, although the more the better).

I have looked at the sql adjacency model which seems like it would take too long to construct each table (having to make too many db calls), the nested set model which is a possibility but is more complex to expand the tree which is something which will happen a lot and it increases database complexity for what I see as could be a very basic structure and query set.

I have also looked in to MongoDb but it seems more geared towards JSON type objects and I am using java, and might be over kill, and also HBase which definetly looks over kill (the advantage being if the amount of nodes becomes huge it might come in useful which is a possibility for the future and I could increase write time to the DB which would be an advantage too)

Anyone have any suggestions of how I might go about this?

Are the NoSql dbs overkill? are they much better at storing tree structures? is it poor practice to use them along side sql databases?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

神也荒唐 2024-11-17 09:45:26

如果您删除嵌套集上的 (rgt - lft - 1) / 2 returns number of Children 属性,并为 lft/rgt 列使用浮点数,则可以在最短的时间内插入/更新/删除节点。

这样做的主要问题是避免与精度相关的问题。您可以通过将 lft/rgt 转换为数字并返回浮点来解决后者,以获得它们的规范表示。以 Postgres 为例:

select (.1::float + .7::float) * 10::float;                          -- 8
select floor((.1::float + .7::float) * 10::float);                   -- 7
select floor(((.1::float + .7::float) * 10::float)::numeric::float); -- 8

另一个问题相当容易管理,并且在空间不足时发生:然后您偶尔需要重新索引部分或全部树 - 它需要锁定树,但速度足够快,您可以这样做不影响正常运行。

If you drop the (rgt - lft - 1) / 2 yields number of children property on nested sets, and use floats for the lft/rgt columns, you can insert/update/delete nodes in minimal time.

The main issue when doing so, is to avoid precision-related issues. You can work around the latter by casting to lft/rgt to numeric and back to float, so as to obtain their canonical representation. Example with Postgres:

select (.1::float + .7::float) * 10::float;                          -- 8
select floor((.1::float + .7::float) * 10::float);                   -- 7
select floor(((.1::float + .7::float) * 10::float)::numeric::float); -- 8

The other issue is reasonably easy to manage and occurs when you run out of space: you then occasionally need to re-index part or all of tree -- it requires locking the tree but it's fast enough that you can do so without impacting normal operations.

我不是你的备胎 2024-11-17 09:45:26

如果您使用的是 SQL Server 2008+,则可以使用新的 HierarchyID适用于此类场景的数据类型。

if you're using SQL Server 2008+ you can use the new HierarchyID data type meant for such scenarios.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文