存储树数据的快速关系方法(例如文章的线程评论)

发布于 2024-07-19 15:01:22 字数 591 浏览 9 评论 0原文

我有一个 cms,用于存储针对文章的评论。 这些评论可以是线程式的,也可以是非线程式的。 尽管从技术上讲,它们是相同的,只是回复列在未线程化时留空。 我的应用程序适用于 sqlLite、MySQL 和 pgsql,因此我需要相当标准的 SQL。

我目前有一个评论表

comment_id
article_id
user_id
comment
timestamp
thread (this is the reply column)

我的问题是弄清楚如何最好地表示数据库中的线程评论。 也许在一个单独的表中,支持没有内容的树集和一个简单的表来保存文本? 也许已经是这样了? 也许还有另一种方式?

如果评论没有线程化,我可以轻松地按时间戳排序。

如果它们是线程化的,我会像这样排序 正如

ORDER BY SUBSTRING(c.thread, 1, (LENGTH(c.thread) - 1))

您从 ORDER BY 中看到的那样,注释查询将永远不会使用索引,因为基于函数的索引仅真正存在于 Oracle 中。 帮助我拥有闪电般快速的评论页面。

I have a cms which stores comments against articles. These comments can be both threaded and non threaded. Although technically they are the same just with the reply column left blank when it's not threaded. My application works on sqlLite, MySQL and pgsql so I need fairly standard SQL.

I currently have a comment table

comment_id
article_id
user_id
comment
timestamp
thread (this is the reply column)

My question is to figure out how to best represent the threaded comments in the database. Perhaps in a separate table that supports the tree set without the content and a simple table to hold the text? Perhaps in the way it already is? Perhaps another way?

If the comments are un-threaded I can easily just order by the timestamp.

If they are threaded I sort like this

ORDER BY SUBSTRING(c.thread, 1, (LENGTH(c.thread) - 1))

As you can see from the ORDER BY, the commenting queries will not ever use an index as function based indexes only really live in Oracle. Help me have lightening fast comment pages.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

浅浅 2024-07-26 15:01:25

实际上,它必须在读和写之间取得平衡。

如果您愿意在每次插入时更新一堆行,那么嵌套集(或等效的)将为您提供轻松、快速的读取。

除此之外,父级上的简单 FK 将为您提供超简单的插入,但很可能对检索来说是一场噩梦。

我想我会选择嵌套集,但要小心预期的数据量和使用模式(每次插入时更新两个索引列(用于左侧和右侧信息)上的几个,也许很多行可能是一个问题在某一点)。

Actually, it has to be a balance between read and write.

If you are OK with updating a bunch of rows on every insert, then nested set (or an equivalent) will give you easy, fast reads.

Other than that, a simple FK on the parent will give you ultra-simple insert, but might well be a nightmare for retrieval.

I think I'd go with the nested sets, but be careful about the expected data volume and usage patterns (updating several, maybe a lot of, rows on two indexed columns (for left and right info) for every insert might be a problem at some point).

还如梦归 2024-07-26 15:01:24

事实上,我自己就是这么做的! 我使用嵌套集模型来表示关系数据库中的分层数据。

在 MySQL 中管理分层数据对我来说简直就是黄金。 嵌套集是该文章中描述的第二个模型。

I just did this myself, actually! I used the nested set model of representing hierarchical data in a relational database.

Managing Hierarchical Data in MySQL was pure gold for me. Nested sets are the second model described in that article.

飞烟轻若梦 2024-07-26 15:01:24

您可以在邻接模型和嵌套集合模型之间进行选择。 管理 MySQL 中的分层数据一文提供了很好的介绍。

有关理论讨论,请参阅 Celko 的树和层次结构

如果您的数据库支持窗口函数,则实现线程列表相当容易。 您所需要的只是目标数据库表中的递归引用,例如:

create Tablename (
  RecordID integer not null default 0 auto_increment,
  ParentID integer default null references RecordID,
  ...
)

然后您可以使用递归公共表表达式来显示线程视图。 此处提供了一个示例。

You've got a choice between the adjacency and the nested set models. The article Managing Hierarchical Data in MySQL makes for a nice introduction.

For a theoretical discussion, see Celko's Trees and Hierarchies.

It's rather easy to implement a threaded list if your database supports windowing functions. All you need is a recursive reference in your target database table, such as:

create Tablename (
  RecordID integer not null default 0 auto_increment,
  ParentID integer default null references RecordID,
  ...
)

You can then use a recursive Common Table Expression to display a threaded view. An example is available here.

音盲 2024-07-26 15:01:24

不幸的是,纯 SQL 方法执行此操作非常慢。

@Marc W 提出的 NESTED SETS 非常优雅,但如果您的树枝达到范围,它们可能需要更新整个树,这可能会非常慢。

请参阅我博客中的这篇文章,了解如何在 MySQL 中快速完成此操作:

您需要创建一个函数:

CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id(value INT) RETURNS INT
NOT DETERMINISTIC
READS SQL DATA
BEGIN
        DECLARE _id INT;
        DECLARE _parent INT;
        DECLARE _next INT;
        DECLARE CONTINUE HANDLER FOR NOT FOUND SET @id = NULL;

        SET _parent = @id;
        SET _id = -1;

        IF @id IS NULL THEN
                RETURN NULL;
        END IF;

        LOOP
                SELECT  MIN(id)
                INTO    @id
                FROM    t_hierarchy
                WHERE   parent = _parent
                        AND id > _id;
                IF @id IS NOT NULL OR _parent = @start_with THEN
                        SET @level = @level + 1;
                        RETURN @id;
                END IF;
                SET @level := @level - 1;
                SELECT  id, parent
                INTO    _id, _parent
                FROM    t_hierarchy
                WHERE   id = _parent;
        END LOOP;
END

并在如下查询中使用它:

SELECT  hi.*
FROM    (
        SELECT  hierarchy_connect_by_parent_eq_prior_id(id) AS id, @level AS level
        FROM    (
                SELECT  @start_with := 0,
                        @id := @start_with,
                        @level := 0
                ) vars, t_hierarchy
        WHERE   @id IS NOT NULL
        ) ho
JOIN    t_hierarchy hi
ON      hi.id = ho.id

这当然是 MySQL 特定的,但它确实很快。

如果您希望它可以在 PostgreSQLMySQL 之间移植,您可以使用 PostgreSQL 的 CONNECT BY 贡献并将查询包装到两个系统具有相同名称的存储过程中。

Unfortunately, the pure SQL methods to do it are quite slow.

The NESTED SETS proposed by @Marc W are quite elegant but they may require updating the whole tree if your tree branches hit the ranges, which can be quite slow.

See this article in my blog on how to do it fast in MySQL:

You'll need to create a function:

CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id(value INT) RETURNS INT
NOT DETERMINISTIC
READS SQL DATA
BEGIN
        DECLARE _id INT;
        DECLARE _parent INT;
        DECLARE _next INT;
        DECLARE CONTINUE HANDLER FOR NOT FOUND SET @id = NULL;

        SET _parent = @id;
        SET _id = -1;

        IF @id IS NULL THEN
                RETURN NULL;
        END IF;

        LOOP
                SELECT  MIN(id)
                INTO    @id
                FROM    t_hierarchy
                WHERE   parent = _parent
                        AND id > _id;
                IF @id IS NOT NULL OR _parent = @start_with THEN
                        SET @level = @level + 1;
                        RETURN @id;
                END IF;
                SET @level := @level - 1;
                SELECT  id, parent
                INTO    _id, _parent
                FROM    t_hierarchy
                WHERE   id = _parent;
        END LOOP;
END

and use it in a query like this:

SELECT  hi.*
FROM    (
        SELECT  hierarchy_connect_by_parent_eq_prior_id(id) AS id, @level AS level
        FROM    (
                SELECT  @start_with := 0,
                        @id := @start_with,
                        @level := 0
                ) vars, t_hierarchy
        WHERE   @id IS NOT NULL
        ) ho
JOIN    t_hierarchy hi
ON      hi.id = ho.id

This is of course MySQL specific but it's real fast.

If you want this to be portable betwen PostgreSQL and MySQL, you can use PostgreSQL's contrib for CONNECT BY and wrap the query into a stored procedure with same name for both systems.

当爱已成负担 2024-07-26 15:01:24

我知道答案有点晚了,但是对于树数据使用闭包表,这是正确的关系方式。
http://www.slideshare.net/billkarwin/models-for-hierarchical- data

它描述了 4 种方法:

  • 邻接列表(简单的父外键)
  • 路径枚举(接受的答案中提到的 Drupal 策略)
  • 嵌套集
  • 闭包表(将祖先/后代事实存储在单独的关系 [表] 中,其中可能的距离列)

与其他选项相比,最后一个选项具有简单的 CRUD 操作的优点。 成本是空间,在最坏的情况下,树节点数量为 O(n^2) 大小,但在实践中可能并没有那么糟糕。

I know the answer is a bit late, but for tree data use a closure table, it is the proper relational way.
http://www.slideshare.net/billkarwin/models-for-hierarchical-data

It describes 4 methods:

  • Adjcency list (the simple parent foreign key)
  • Path enumeration (the Drupal strategy mentioned in the accepted answer)
  • Nested sets
  • Closure table (storing ancestor/descendant facts in a separate relation [table], with a possible distance column)

The last option has advantages of easy CRUD operations compared to the rest. The cost is space, which is O(n^2) size in the number tree nodes in the worst case, but probably not so bad in practice.

情深已缘浅 2024-07-26 15:01:23

我真的很喜欢 Drupal 解决这个问题的方式。 它为每个评论分配一个线程 ID。 第一条评论的 ID 从 1 开始。 如果对此评论添加回复,则会为其分配 id 1.1。 对评论 1.1 的回复会被赋予线程 ID 1.1.1。 注释 1.1 的同级被赋予线程 ID 1.2。 你明白了。 添加评论时,可以通过一个查询轻松计算这些线程 ID。

当线程被渲染时,属于该线程的所有评论都会在单个查询中获取,并按线程 ID 排序。 这将为您提供按升序排列的线程。 此外,使用线程 ID,您可以找到每个评论的嵌套级别,并相应地缩进。

1
1.1
1.1.1
1.2
1.2.1

有几个问题需要解决:

  • 如果线程 id 的一个组成部分增长到 2 位数字,则按线程 id 排序将不会产生预期的顺序。 一个简单的解决方案是确保线程 id 的所有组件都用零填充以具有相同的宽度。
  • 按线程 ID 降序排序不会产生预期的降序。

Drupal 使用称为 vancode 的编号系统以更复杂的方式解决第一个问题。 对于第二个问题,通过在线程id降序排序时添加一个反斜杠(其ASCII码高于数字)来解决。 您可以通过检查 评论模块(请参阅函数comment_get_thread之前的大评论)。

I really like how Drupal solves this problem. It assigns a thread id to each comment. This id starts at 1 for the first comment. If a reply is added to this comment, the id 1.1 is assigned to it. A reply to comment 1.1 is given the thread id 1.1.1. A sibling of comment 1.1 is given the thread id 1.2. You get the idea. Calculating these thread ids can be done easily with one query when a comment is added.

When the thread is rendered, all of the comments that belong to the thread are fetched in a single query, sorted by the thread id. This gives you the threads in the ascending order. Furthermore, using the thread id, you can find the nesting level of each comment, and indent it accordingly.

1
1.1
1.1.1
1.2
1.2.1

There are a few issues to sort out:

  • If one component of the thread id grows to 2 digits, sorting by thread id will not produce the expected order. An easy solution is ensuring that all components of a thread id are padded by zeros to have the same width.
  • Sorting by descending thread id does not produce the expected descending order.

Drupal solves the first issue in a more complicated way using a numbering system called vancode. As for the second issue, it is solved by appending a backslash (whose ASCII code is higher than digits) to thread ids when sorting by descending order. You can find more details about this implementation by checking the source code of the comments module (see the big comment before the function comment_get_thread).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文