SQL 中的分层标记
我有一个 PHP Web 应用程序,它使用 MySQL 数据库进行对象标记,其中我使用了作为 这个问题。
我想实现一个标签层次结构,其中每个标签都可以有一个唯一的父标签。 然后,对父标签 T 的搜索将匹配 T 的所有后代(即 T、父标签是 T(T 的子代)、T 的孙子等)。
最简单的方法似乎是在标签表中添加一个 ParentID 字段,其中包含标签的父标签的 ID,如果标签没有父标签,则添加一些幻数。 然而,搜索后代需要重复对数据库进行全面搜索,以找到每个“一代”中的标签,这是我想避免的。
一种(大概)更快但不太标准化的方法是拥有一个包含每个标签的所有子代,甚至每个标签的所有后代的表。 然而,这存在数据库中数据不一致的风险(例如,一个标签是多个父标签的子标签)。
有没有一种好方法可以让查询快速找到后代,同时尽可能保持数据标准化?
I have a PHP web application which uses a MySQL database for object tagging, in which I've used the tag structure accepted as the answer to this SO question.
I'd like to implement a tag hierarchy, where each tag can have a unique parent tag. Searches for a parent tag T would then match all descendants of T (i.e. T, tags whos parent is T (children of T), grandchildren of T, etc.).
The easiest way of doing this seems to be to add a ParentID field to the tag table, which contains the ID of a tag's parent tag, or some magic number if the tag has no parent. Searching for descendants, however, then requires repeated full searches of the database to find the tags in each 'generation', which I'd like to avoid.
A (presumably) faster, but less normalised way of doing this would be to have a table containing all the children of each tag, or even all the descendants of each tag. This however runs the risk of inconsistent data in the database (e.g. a tag being the child of more than one parent).
Is there a good way to make queries to find descendants fast, while keeping the data as normalised as possible?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我使用两列来实现它。 我在这里稍微简化了它,因为我必须将标签名称保留在单独的字段/表中,因为我必须将其本地化为不同的语言:
查看这些行,例如:
等。
使用
like 运算符,您可以轻松获取所有需要的标记行:
有一些实现细节,例如当您在层次结构中移动节点时,您也必须更改所有子节点等,但这并不难。
还要确保路径的长度足够长 - 在我的例子中,我没有使用路径的标签名称,而是使用另一个字段来确保路径不会太长。
I implemented it using two columns. I simplify it here a little, because I had to keep the tag name in a separate field/table because I had to localize it for different languages:
Look at these rows for example:
etc.
Using the
like
operator on the path field you can easily get all needed tag rows:There are some implementation details like when you move a node in the hierarchy you have to change all children too etc., but it's not hard.
Also make sure that the length of your path is long enough - in my case I used not the tag name for the path, but another field to make sure that I don't get too long paths.
Ali 的回答包含Joe Celko 为聪明人编写的 SQL 中的树和层次结构,这证实了我的怀疑 - 没有一个简单的数据库结构可以提供世界上最好的。 最适合我的目的似乎是本书中详细介绍的“频繁插入树”,它类似于阿里链接的“嵌套集模型”,但具有非连续索引。 这允许 O(1) 插入(a非结构化 BASIC 行编号),并在需要时偶尔进行索引重组。
Ali's answer has a link to Joe Celko's Trees and Hierarchies in SQL for Smarties, which confirms my suspicion - there isn't a simple database structure that offers the best of all worlds. The best for my purpose seems to be the "Frequent Insertion Tree" detailed in this book, which is like the "Nested Set Model" of Ali's link, but with non-consecutive indexing. This allows O(1) insertion (a la unstructured BASIC line numbering), with occasional index reorganisation as and when needed.
这里有几种方法
A few ways here
您可以构建 Kimball 所说的层次结构辅助表。
假设您的层次结构如下所示:A -> 乙| B-> C | C-> D
你会将记录插入到一个看起来像这样的表中,
我想我的想法是正确的......无论如何。 关键是您仍然正确存储层次结构,您只需从正确的表构建此表即可。 这个表的查询就像 Banshee 一样。 假设您想知道 B 以下的所有第一级是什么。
You could build what Kimball calls a Hierarchy Helper Table.
Say you hierarchy looks like this: A -> B | B -> C | C -> D
you'd insert records into a table that looks like this
I think I have that correct.... anyways. The point is you still store you hierarchy correctly, you just build this table FROM your proper table. THIS table queries like a Banshee. Say you want to know what all the first level below B are.
我会使用某种数组来存储子标签,这应该比加入表本身要快得多(特别是如果您有大量标签)。 我看了一下,我无法判断 mysql 是否具有本机数组数据类型,但您可以通过使用文本列并在其中存储序列化数组来模拟它。 如果您想进一步加快速度,您应该能够在该列上放置文本搜索索引以找出哪些标签相关。
[编辑]
读完 Ali 的文章后,我又做了一些搜索,发现了这个演示文稿在 postgres 中实现层次结构的方法。 对于解释目的可能仍然有帮助。
I would use some kind of array to store the children tags, this should be a lot faster than joining a table on itself (especially if you have a large number of tags). I had a look, and I can't tell if mysql has a native array data type, but you can emulate this by using a text column and storing a serialized array in it. If you want to speed things up further, you should be able to put a text search index on that column to find out which tags are related.
[Edit]
After reading Ali's article, I did some more hunting and found this presentation on a bunch of approaches for implementing hierarchies in postgres. Might still be helpful for explanatory purposes.