SQL Server:如何将 CTE 递归限制为刚刚递归添加的行?
更简单的示例
让我们尝试一个更简单的示例,这样人们就可以理解这些概念,并有一个可以复制并粘贴到 SQL 查询分析器中的实际示例:
想象一个具有层次结构的 Nodes 表:
A
- B
- C
我们可以开始在查询分析器中进行测试:
CREATE TABLE ##Nodes
(
NodeID varchar(50) PRIMARY KEY NOT NULL,
ParentNodeID varchar(50) NULL
)
INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('A', null)
INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('B', 'A')
INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('C', 'B')
所需输出:
ParentNodeID NodeID GenerationsRemoved
============ ====== ==================
NULL A 1
NULL B 2
NULL C 3
A B 1
A C 2
B C 1
现在是建议的 CTE 表达式,其输出不正确:
WITH NodeChildren AS
(
--initialization
SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
FROM ##Nodes
WHERE ParentNodeID IS NULL
UNION ALL
--recursive execution
SELECT P.ParentNodeID, N.NodeID, P.GenerationsRemoved + 1
FROM NodeChildren AS P
INNER JOIN ##Nodes AS N
ON P.NodeID = N.ParentNodeID
)
SELECT ParentNodeID, NodeID, GenerationsRemoved
FROM NodeChildren
实际输出:
ParentNodeID NodeID GenerationsRemoved
============ ====== ==================
NULL A 1
NULL B 2
NULL C 3
注意: 如果 SQL Server 2005† CTE 无法完成我在 2000 年之前所做的事情‡,没关系,这就是答案。 谁给出“不可能”作为答案,谁就将赢得赏金。 但我会等几天,以确保每个人都同意,在我因无法解决我的问题而无可挽回地给予 250 声望之前,这是不可能的。
Nitpickers Corner
†不是 2008 年
‡不诉诸 UDF*,这是已有的解决方案
*除非您可以在原始问题中找到提高 UDF 性能的方法
原始问题
我有一个表格节点,每个节点都有一个指向另一个节点(或 null)的父节点。
举例来说:
1 My Computer
2 Drive C
4 Users
5 Program Files
7 Windows
8 System32
3 Drive D
6 mp3
我想要一个返回所有父子关系以及它们之间的代数的表
对于所有直接父母关系:
ParentNodeID ChildNodeID GenerationsRemoved
============ =========== ===================
(null) 1 1
1 2 1
2 4 1
2 5 1
2 7 1
1 3 1
3 6 1
7 8 1
但是还有祖父母关系:
ParentNodeID ChildNodeID GenerationsRemoved
============ =========== ===================
(null) 2 2
(null) 3 2
1 4 2
1 5 2
1 7 2
1 6 2
2 8 2
还有曾祖祖父母关系:
ParentNodeID ChildNodeID GenerationsRemoved
============ =========== ===================
(null) 4 3
(null) 5 3
(null) 7 3
(null) 6 3
1 8 3
所以我可以弄清楚基本的CTE初始化:
WITH (NodeChildren) AS
{
--initialization
SELECT ParentNodeID, NodeID AS ChildNodeID, 1 AS GenerationsRemoved
FROM Nodes
}
现在的问题是递归部分。 当然,显而易见的答案是行不通的:
WITH (NodeChildren) AS
{
--initialization
SELECT ParentNodeID, ChildNodeID, 1 AS GenerationsRemoved
FROM Nodes
UNION ALL
--recursive execution
SELECT parents.ParentNodeID, children.NodeID, parents.Generations+1
FROM NodeChildren parents
INNER JOIN NodeParents children
ON parents.NodeID = children.ParentNodeID
}
Msg 253, Level 16, State 1, Line 1
Recursive member of a common table expression 'NodeChildren' has multiple recursive references.
生成整个递归列表所需的所有信息都存在于初始 CTE 表中。 但如果不允许,我会尝试:
WITH (NodeChildren) AS
{
--initialization
SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
FROM Nodes
UNION ALL
--recursive execution
SELECT parents.ParentNodeID, Nodes.NodeID, parents.Generations+1
FROM NodeChildren parents
INNER JOIN Nodes
ON parents.NodeID = nodes.ParentNodeID
}
但这会失败,因为它不仅加入递归元素,而且不断递归地一遍又一遍地添加相同的行:
Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
在 SQL Server 2000 中,我通过使用用户定义函数(UDF)模拟了 CTE ):
CREATE FUNCTION [dbo].[fn_NodeChildren] ()
RETURNS @Result TABLE (
ParentNodeID int NULL,
ChildNodeID int NULL,
Generations int NOT NULL)
AS
/*This UDF returns all "ParentNode" - "Child Node" combinations
...even multiple levels separated
BEGIN
DECLARE @Generations int
SET @Generations = 1
--Insert into the Return table all "Self" entries
INSERT INTO @Result
SELECT ParentNodeID, NodeID, @Generations
FROM Nodes
WHILE @@rowcount > 0
BEGIN
SET @Generations = @Generations + 1
--Add to the Children table:
-- children of all nodes just added
-- (i.e. Where @Result.Generation = CurrentGeneration-1)
INSERT @Result
SELECT CurrentParents.ParentNodeID, Nodes.NodeID, @Generations
FROM Nodes
INNER JOIN @Result CurrentParents
ON Nodes.ParentNodeID = CurrentParents.ChildNodeID
WHERE CurrentParents.Generations = @Generations - 1
END
RETURN
END
防止它爆炸的神奇之处在于限制 where 子句: WHERE CurrentParents.Generations - @Generations-1
如何防止递归 CTE 永远递归?
Simpler Example
Let's try a simpler example, so people can wrap their heads around the concepts, and have a practical example that you can copy&paste into SQL Query Analizer:
Imagine a Nodes table, with a heirarchy:
A
- B
- C
We can start testing in Query Analizer:
CREATE TABLE ##Nodes
(
NodeID varchar(50) PRIMARY KEY NOT NULL,
ParentNodeID varchar(50) NULL
)
INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('A', null)
INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('B', 'A')
INSERT INTO ##Nodes (NodeID, ParentNodeID) VALUES ('C', 'B')
Desired output:
ParentNodeID NodeID GenerationsRemoved
============ ====== ==================
NULL A 1
NULL B 2
NULL C 3
A B 1
A C 2
B C 1
Now the suggested CTE expression, with it's incorrect output:
WITH NodeChildren AS
(
--initialization
SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
FROM ##Nodes
WHERE ParentNodeID IS NULL
UNION ALL
--recursive execution
SELECT P.ParentNodeID, N.NodeID, P.GenerationsRemoved + 1
FROM NodeChildren AS P
INNER JOIN ##Nodes AS N
ON P.NodeID = N.ParentNodeID
)
SELECT ParentNodeID, NodeID, GenerationsRemoved
FROM NodeChildren
Actual output:
ParentNodeID NodeID GenerationsRemoved
============ ====== ==================
NULL A 1
NULL B 2
NULL C 3
Note: If SQL Server 2005† CTE cannot do what i was doing before in 2000‡, that's fine, and that's the answer. And whoever gives "it's not possible" as the answer will win the bounty. But i will wait a few days to make sure everyone concur's that it's not possible before i irrecovably give 250 reputation for a non-solution to my problem.
Nitpickers Corner
†not 2008
‡without resorting to a UDF*, which is the solution already have
*unless you can see a way to improve the performance of the UDF in the original question
Original Question
i have a table of Nodes, each with a parent that points to another Node (or to null).
For illustration:
1 My Computer
2 Drive C
4 Users
5 Program Files
7 Windows
8 System32
3 Drive D
6 mp3
i want a table that returns all the parent-child relationships, and the number of generations between them
For for all direct parent relationships:
ParentNodeID ChildNodeID GenerationsRemoved
============ =========== ===================
(null) 1 1
1 2 1
2 4 1
2 5 1
2 7 1
1 3 1
3 6 1
7 8 1
But then there's the grandparent relationships:
ParentNodeID ChildNodeID GenerationsRemoved
============ =========== ===================
(null) 2 2
(null) 3 2
1 4 2
1 5 2
1 7 2
1 6 2
2 8 2
And the there's the great-grand-grandparent relationships:
ParentNodeID ChildNodeID GenerationsRemoved
============ =========== ===================
(null) 4 3
(null) 5 3
(null) 7 3
(null) 6 3
1 8 3
So i can figure out the basic CTE initialization:
WITH (NodeChildren) AS
{
--initialization
SELECT ParentNodeID, NodeID AS ChildNodeID, 1 AS GenerationsRemoved
FROM Nodes
}
The problem now is the recursive part. The obvious answer, of course, doesn't work:
WITH (NodeChildren) AS
{
--initialization
SELECT ParentNodeID, ChildNodeID, 1 AS GenerationsRemoved
FROM Nodes
UNION ALL
--recursive execution
SELECT parents.ParentNodeID, children.NodeID, parents.Generations+1
FROM NodeChildren parents
INNER JOIN NodeParents children
ON parents.NodeID = children.ParentNodeID
}
Msg 253, Level 16, State 1, Line 1
Recursive member of a common table expression 'NodeChildren' has multiple recursive references.
All the information needed to generate the entire recursive list is present in the inital CTE table. But if that's not allowed i'll try:
WITH (NodeChildren) AS
{
--initialization
SELECT ParentNodeID, NodeID, 1 AS GenerationsRemoved
FROM Nodes
UNION ALL
--recursive execution
SELECT parents.ParentNodeID, Nodes.NodeID, parents.Generations+1
FROM NodeChildren parents
INNER JOIN Nodes
ON parents.NodeID = nodes.ParentNodeID
}
But that fails because it's not only joining on the recursive elements, but keeps recursivly adding the same rows over and over:
Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
In SQL Server 2000 i simulated a CTE by using a User Defined Function (UDF):
CREATE FUNCTION [dbo].[fn_NodeChildren] ()
RETURNS @Result TABLE (
ParentNodeID int NULL,
ChildNodeID int NULL,
Generations int NOT NULL)
AS
/*This UDF returns all "ParentNode" - "Child Node" combinations
...even multiple levels separated
BEGIN
DECLARE @Generations int
SET @Generations = 1
--Insert into the Return table all "Self" entries
INSERT INTO @Result
SELECT ParentNodeID, NodeID, @Generations
FROM Nodes
WHILE @@rowcount > 0
BEGIN
SET @Generations = @Generations + 1
--Add to the Children table:
-- children of all nodes just added
-- (i.e. Where @Result.Generation = CurrentGeneration-1)
INSERT @Result
SELECT CurrentParents.ParentNodeID, Nodes.NodeID, @Generations
FROM Nodes
INNER JOIN @Result CurrentParents
ON Nodes.ParentNodeID = CurrentParents.ChildNodeID
WHERE CurrentParents.Generations = @Generations - 1
END
RETURN
END
And the magic to keep it from blowing up was the limiting where clause:
WHERE CurrentParents.Generations - @Generations-1
How do you prevent a recursive CTE from recursing forever?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
试试这个:
基本上从初始化查询中删除“只显示绝对父母”; 这样,它会生成从每个结果开始并从那里递减的结果。 我还在“WHERE P.GenerationsRemoved <= 10”中添加了无限递归捕获(将 10 替换为 100 以内的任何数字以满足您的需要)。 然后添加排序,使其看起来像您想要的结果。
Try this:
Basically removing the "only show me absolute parents" from the initialization query; That way it generates the results starting from each of them and decending from there. I also added in the "WHERE P.GenerationsRemoved <= 10" as an infinite recursion catch(replace 10 with any number up to 100 to fit your needs). Then add the sort so it looks like the results you wanted.
旁白:你有 SQL Server 2008 吗? 这可能适合
hierarchyid
数据类型。Aside: do you have SQL Server 2008? This might be suited to the
hierarchyid
data type.如果我理解您的意图,您可以通过执行以下操作来获得结果:
这将返回所有节点之间的代差,您可以根据您的确切需要对其进行修改。
If I understand your intentions you can get you result by doing something like this:
This will return the generation difference between all nodes, you can modify it for you exact needs.
好吧,你的答案不是那么明显:-)
这部分被称为递归 CTE 的“锚”部分 - 但它实际上应该只从表中选择一行或选定的几行 - 这会选择所有内容!
我想你在这里缺少的只是一个合适的 WHERE 子句:
但是,恐怕你的要求不仅要有“直接”层次结构,而且要有祖父母-孩子行,可能不太容易满足......通常,递归 CTE 只会显示一级及其直接下级(当然,还有层次结构中的下一级)——它通常不会跳过一级、二级甚至更多级。
希望这个对你有帮助。
马克
Well, your answer is not quite that obvious :-)
This part is called the "anchor" part of the recursive CTE - but it should really only select one or a select few rows from your table - this selects everything!
I guess what you're missing here is simply a suitable WHERE clause:
However, I am afraid your requirement to have not just the "straight" hierarchy, but also the grandparent-child rows, might not be that easy to satisfy.... normally recursive CTE will only ever show one level and its direct subordinates (and that down the hierarchy, of course) - it doesn't usually skip one, two or even more levels.
Hope this helps a bit.
Marc
问题在于 Sql Server 默认递归限制 (100)。 如果您在顶部尝试删除锚点限制(还添加了排序依据)的示例:
这会产生所需的结果。 您面临的问题是,如果行数较多,您将重复 100 次以上,这是默认限制。 这可以通过在查询后添加
选项(最大递归 x)
来更改,其中 x 是 1 到 32767 之间的数字。x 也可以设置为 0,这不会设置任何限制,但很快就会有一个对您的服务器性能产生非常不利的影响。 显然,随着节点中行数的增加,递归次数会很快增加,除非表中的行数有已知的上限,否则我会避免这种方法。 为了完整起见,最终查询应如下所示:其中 32767 可以向下调整以适合您的场景
The issue is with the Sql Server default recursion limit (100). If you try your example at the top with the anchor restriction removed (also added Order By):
This produces the desired results. The problem you aare facing is with a larger number of rows you will recirse over 100 times which is a default limit. This can be changed by adding
option (max recursion x)
after your query, where x is a number between 1 and 32767. x can also be set to 0 which sets no limit however could very quickly have a very detrimental impact on your server performance. Clearly as the number of rows in Nodes increases, the number of recursions will can rise very quickly and I would avoid this approach unless there was a known upper limit on the rows in the table. For completeness, the final query should look like:Where 32767 could be adjusted downwards to fit your scenario
您是否尝试过在 CTE 中构建一条路径并使用它来识别祖先?
然后,您可以从祖先节点深度中减去后代节点深度来计算 GenerationsRemoved 列,如下所示...
Have you tried constructing a path in the CTE and using it to identify ancestors?
You can then subtract the descendant node depth from the ancestor node depth to calculate the GenerationsRemoved column, like so...
这打破了克里斯·谢弗的答案所施加的递归限制。
我创建一个带有循环的表:
在存在潜在循环的情况下(即 ParentNodeId IS NOT NULL),删除的生成从 2 开始。然后我们可以通过检查 (P.ParentNodeID == N.NodeID) 来标识循环,我们根本不添加它。 然后,我们添加省略的代remove = 1。
This breaks the recursion limit imposed on Chris Shaffer's answer.
I create a table with a cycle:
In cases where there is a potential cycle (i.e. ParentNodeId IS NOT NULL), the generation removed is started at 2. We can then identity cycles by checking (P.ParentNodeID == N.NodeID), which we simply don't add it. Afterwards, we append the omitted generation remove = 1.
我的示例显示了一个递归 CTE,它在 100 个级别(最大值)后停止递归。 作为奖励,它会显示一堆 ASCII 字符和相应的数值。
My example here shows a recursive CTE that stops recursion after 100 levels (the max). As a bonus, it displays a bunch of ASCII characters and the corresponding numeric value.