nosql 数据库中的树结构

发布于 2024-09-10 18:54:15 字数 858 浏览 5 评论 0原文

我正在为 Google App Engine 开发一个应用程序,它使用 BigTable 作为其数据存储。

这是一个关于协作编写故事的应用程序。这是一个非常简单的爱好项目,我只是为了好玩而做。它是开源的,您可以在这里看到它: http://story.multifarce.com/

这个想法是任何人都可以写一个段落,然后需要由另外两个人验证。故事也可以在任何段落分支,以便故事的另一个版本可以朝另一个方向继续。

想象一下以下树结构:

每个数字将是一个段落。我希望能够选择每个独特故事情节中的所有段落。基本上,这些独特的故事情节是(2,7,2); (2,7,6,5); (2、7、6、11)和(2、5、9、4)。忽略节点“2”出现了两次,我只是从维基百科上拿了一个树结构图。

我还制作了建议解决方案的图表:https://docs。 google.com/drawings/edit?id=1fdUISIjGVBvIKMSCjtE4xFNZxiE08AoqvJSLQbxN6pc&hl=en

如何设置一个既适合写作又适合阅读的结构?

I'm developing an application for Google App Engine which uses BigTable for its datastore.

It's an application about writing a story collaboratively. It's a very simple hobby project that I'm working on just for fun. It's open source and you can see it here: http://story.multifarce.com/

The idea is that anyone can write a paragraph, which then needs to be validated by two other people. A story can also be branched at any paragraph, so that another version of the story can continue in another direction.

Imagine the following tree structure:

Every number would be a paragraph. I want to be able to select all the paragraphs in every unique story line. Basically, those unique story lines are (2, 7, 2); (2, 7, 6, 5); (2, 7, 6, 11) and (2, 5, 9, 4). Ignore that the node "2" appears twice, I just took a tree structure diagram from Wikipedia.

I also made a diagram of a proposed solution: https://docs.google.com/drawings/edit?id=1fdUISIjGVBvIKMSCjtE4xFNZxiE08AoqvJSLQbxN6pc&hl=en

How can I set up a structure is performance efficient both for writing, but most importantly for reading?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

夜空下最亮的亮点 2024-09-17 18:54:15

有许多众所周知的方法来表示数据库中的树;他们每个人都有自己的优点和缺点。以下是最常见的:

  • 邻接列表,其中每个节点存储其父节点的 ID。
  • 物化路径,这是Keyur描述的策略。这也是 App Engine 中实体组(例如父实体)使用的方法。这也或多或少是您在更新中描述的内容。
  • 嵌套集,其中每个节点都有“左”和“右”ID,这样所有子节点都包含在这个范围内。
  • 邻接列表用根 ID 进行扩充。

其中每一个都有其自身的优点和缺点。邻接列表很简单,更新成本也很低,但需要多个查询来检索子树(每个父节点一个)。增强邻接表可以通过在每个记录中存储根节点的 ID 来检索整个树。

物化路径易于实现且更新成本低廉,并且允许查询任意子树,但会增加深层树的开销。

嵌套集更难实现,并且每次插入时平均需要更新一半的节点。它们允许您查询任意子树,而不会出现物化路径增加密钥长度的问题。

不过,在您的具体情况下,似乎您实际上根本不需要树结构:每个故事虽然可能是从原始故事中分支出来的,但都是独立的。我建议使用一个“故事”模型,其中包含其段落的键列表(例如,在Python中为db.ListProperty(db.Key))。要渲染故事,您需要获取故事,然后对所有段落进行批量获取。要对故事进行分支,只需复制故事条目 - 保持对段落的引用不变。

There are a number of well known ways to represent trees in databases; each of them have their pros and cons. Here are the most common:

  • Adjacency list, where each node stores the ID of its parent.
  • Materialized path, which is the strategy Keyur describes. This is also the approach used by entity groups (eg, parent entities) in App Engine. It's also more or less what you're describing in your update.
  • Nested sets, where each node has 'left' and 'right' IDs, such that all child nodes are contained in that range.
  • Adjacency lists agumented with a root ID.

Each of these has its own advantages and disadvantages. Adjacency lists are simple, and cheap to update, but require multiple queries to retrieve a subtree (one for each parent node). Augmented adjacency lists make it possible to retrieve an entire tree by storing the ID of the root node in every record.

Materialized paths are easy to implement and cheap to update, and permit querying arbitrary subtrees, but impose increasing overhead for deep trees.

Nested sets are tougher to implement, and require updating, on average, half the nodes each time you make an insertion. They allow you to query arbitrary subtrees, without the increasing key length issue materialized path has.

In your specific case, though, it seems like you don't actually need a tree structure at all: each story, branched off an original though it may be, stands alone. What I would suggest is having a 'Story' model, which contains a list of keys of its paragraphs (Eg, in Python a db.ListProperty(db.Key)). To render a story, you fetch the Story, then do a batch fetch for all the Paragraphs. To branch a story, simply duplicate the story entry - leaving the references to Paragraphs unchanged.

只想待在家 2024-09-17 18:54:15

我能想到的一种解决方案是——节点的路径也是该节点的关键。所以节点11的键是“2/7/6/11”。您可以通过路径中所有键的简单键查找来遍历路径 - “2/7/6/11”、“2/7/6”、“2/7”、“2”

One solution I can think about is - the path to the node is also the key of that node. So the key of node 11 is "2/7/6/11". You can traverse the path by a simple key lookup of all keys in the path - "2/7/6/11", "2/7/6", "2/7", "2"

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文