递归子目录SQL问题

发布于 2024-08-23 10:29:32 字数 1075 浏览 5 评论 0原文

这是困扰我一段时间的心理练习。您会使用什么策略来解决此类问题？

让我们考虑以下简单的数据库结构。我们有目录，显然是一棵树。此外，我们还有内容项，它们始终驻留在某些目录中。

create table directory ( 
 directoryId integer generated always as identity primary key,
 parentId integer default null,
 directoryName varchar(100)
);

create table content (
 contentId integer generated always as identity primary key,
 directory integer references directory(directoryId),
 contentTitle varchar(100),
 contentText varchar(32000)
);

现在我们假设我们的目录树很大并且内容量很大。该解决方案必须具有良好的可扩展性。

主要问题：如何有效地检索从指定目录及其子目录中找到的所有内容项？

我认为 SQL 不能用于轻松获取子选择的所有目录 ID。我说得对吗？

人们可以通过简单的递归循环在应用程序端解决这个问题。但这实际上可能会变得非常繁重，并且需要棘手的缓存，尤其是为了保证合理的首次访问时间。

也许还可以构建一个具体化查询表并为其动态添加多维索引。可能，但实施混乱。太复杂了。

我最喜欢的解决方案可能是添加一个新表，

create table subdirectories (
 directoryId integer,
 subdirectoryId integer,
 constraint thekey primary key (directoryId,subdirectoryId)
)

并确保在移动/删除/创建目录时我总是手动更新它。因此，我始终可以使用 DirectoryId 进行选择并获取子目录的所有 Id，包括作为更复杂查询的子选择。我还喜欢 RDBMS 能够很好地优化查询这一事实。

你们觉得怎么样？

原文

This is a mental excercise that has been bothering me for a while. What strategy would you use to solve this sort of problem?

Let's consider the following simple database structure. We have directories, obviously a tree of them. Also we have content items, which always reside in some directories.

create table directory ( 
 directoryId integer generated always as identity primary key,
 parentId integer default null,
 directoryName varchar(100)
);

create table content (
 contentId integer generated always as identity primary key,
 directory integer references directory(directoryId),
 contentTitle varchar(100),
 contentText varchar(32000)
);

Now let's assume that our directory tree is massive and the amount of content is massive. The solution must scale well.

The main problem: How to efficiently retrieve all content items that are found from the specified directory and its subdirectories?

The way I see it SQL can not be used to get easily all the directoryIds for a subselect. Am I correct?

One could solve this at application side with simple recursive loop. That might become actually very heavy though and require tricky caching, especially to quarantee reasonable first access times.

One could also perhaps build a materialized query table and add multi-dimensional indexes dynamically for it. Possible but an implementation mess. Too complex.

My far most favorite solution would be probably to add a new table like

create table subdirectories (
 directoryId integer,
 subdirectoryId integer,
 constraint thekey primary key (directoryId,subdirectoryId)
)

and make sure I would always update it manually when directories are being moved/deleted/created. Thus I could always do a select with the directoryId and get all Ids for subdirectories, including as a subselect for more complex queries. I also like the fact that the rdbms is able to optimize the queries well.

What do you guys think?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

晨敛清荷 2024-08-30 10:29:32

在 SQL Server 2005、PostgreSQL 8.4 和 Oracle 11g 中：

WITH    
        -- uncomment the next line in PostgreSQL
        -- RECURSIVE
        q AS
        (
        SELECT  directoryId
        FROM    directories
        WHERE   directoryId = 1
        UNION ALL
        SELECT  d.directoryId 
        FROM    q
        JOIN    directories
        WHERE   parentId = q.directoryId
        )
SELECT  c.*
FROM    q
JOIN    content c
ON      c.directory = q.directoryId

在 11g 之前的 Oracle 中：

SELECT  c.*
FROM    (
        SELECT  directoryId
        FROM    directories
        START WITH
                directoryId = 1
        CONNECT BY
                parent = PRIOR directoryID
        ) q
JOIN    content c
ON      c.directory = q.directoryId

对于 PostgreSQL 8.3 及以下版本，请参阅这篇文章：

PostgreSQL 中的分层查询

对于MySQL，请参阅这篇文章：

MySQL 中的分层查询

In SQL Server 2005, PostgreSQL 8.4 and Oracle 11g:

WITH    
        -- uncomment the next line in PostgreSQL
        -- RECURSIVE
        q AS
        (
        SELECT  directoryId
        FROM    directories
        WHERE   directoryId = 1
        UNION ALL
        SELECT  d.directoryId 
        FROM    q
        JOIN    directories
        WHERE   parentId = q.directoryId
        )
SELECT  c.*
FROM    q
JOIN    content c
ON      c.directory = q.directoryId

In Oracle before 11g:

SELECT  c.*
FROM    (
        SELECT  directoryId
        FROM    directories
        START WITH
                directoryId = 1
        CONNECT BY
                parent = PRIOR directoryID
        ) q
JOIN    content c
ON      c.directory = q.directoryId

For PostgreSQL 8.3 and below see this article: