SQL Server 2005 递归查询与数据循环 - 可能吗?
我有一个标准的老板/下属员工表。 我需要选择一个老板(由 ID 指定)和他的所有下属(及其下属等)。 不幸的是,现实世界的数据存在一些循环(例如,两个公司所有者都将对方设置为自己的老板)。 带有 CTE 的简单递归查询对此感到窒息(超出了最大递归级别 100)。 员工还能选拔吗? 我不关心它们被选择的顺序,只关心它们每个被选择一次。
Added: You want my query? Umm... OK... I though it is pretty obvious, but - here it is:
with
UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = @UserID
union all
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
)
select * from UserTbl
Added 2: Oh, in case it wasn't clear - this is a production system and I have to do a little upgrade (basically add a sort of report). Thus, I'd prefer not to modify the data if it can be avoided.
I've got a standard boss/subordinate employee table. I need to select a boss (specified by ID) and all his subordinates (and their subrodinates, etc). Unfortunately the real world data has some loops in it (for example, both company owners have each other set as their boss). The simple recursive query with a CTE chokes on this (maximum recursion level of 100 exceeded). Can the employees still be selected? I care not of the order in which they are selected, just that each of them is selected once.
Added: You want my query? Umm... OK... I though it is pretty obvious, but - here it is:
with
UserTbl as -- Selects an employee and his subordinates.
(
select a.[User_ID], a.[Manager_ID] from [User] a WHERE [User_ID] = @UserID
union all
select a.[User_ID], a.[Manager_ID] from [User] a join UserTbl b on (a.[Manager_ID]=b.[User_ID])
)
select * from UserTbl
Added 2: Oh, in case it wasn't clear - this is a production system and I have to do a little upgrade (basically add a sort of report). Thus, I'd prefer not to modify the data if it can be avoided.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
这是我在一个项目中使用的代码,用于上下追踪层次关系树。
用户定义的函数来捕获下属:
用户定义的函数来捕获经理:
This is the code I used on a project to chase up and down hierarchical relationship trees.
User defined function to capture subordinates:
User defined function to capture managers:
更好的解决方案是清理数据并确保将来不会出现任何循环 - 这可以通过触发器或包含在检查约束中的 UDF 来完成。
但是,您可以使用多语句 UDF,如我在此处演示的: 避免无限循环。 第一部分
您可以在连接中添加 NOT IN() 子句来过滤掉循环。
The preferrable solution is to clean up the data and to make sure you do not have any loops in the future - that can be accomplished with a trigger or a UDF wrapped in a check constraint.
However, you can use a multi statement UDF as I demonstrated here: Avoiding infinite loops. Part One
You can add a NOT IN() clause in the join to filter out the cycles.
我可以想到两种方法。
1)生成比您想要的更多的行,但要进行检查以确保它不会递归得太深。 然后删除重复的用户记录。
2)使用字符串来保存已经访问过的用户。 就像不在子查询中的想法不起作用一样。
方法 1:
行
where Depth < 100
可以防止出现最大递归错误。 减小这个数字,就会产生更少需要丢弃的记录。 如果它太小,员工将不会被返回,因此请确保它至少与所存储的组织结构图的深度一样大。 随着公司的发展,维护有点像噩梦。 如果需要更大,则将选项(maxrecursion ... number ...)添加到整个事物中以允许更多递归。方法二:
I can think of two approaches.
1) Produce more rows than you want, but include a check to make sure it does not recurse too deep. Then remove duplicate User records.
2) Use a string to hold the Users already visited. Like the not in subquery idea that didn't work.
Approach 1:
The line
where Depth < 100
is what keeps you from getting the max recursion error. Make this number smaller, and less records will be produced that need to be thrown away. Make it too small and employees won't be returned, so make sure it is at least as large as the depth of the org chart being stored. Bit of a maintence nightmare as the company grows. If it needs to be bigger, then addoption (maxrecursion ... number ...)
to whole thing to allow more recursion.Approach 2:
基本上,如果数据中有这样的循环,您就必须自己执行检索逻辑。
你可以使用一个 cte 来吸引下属,而使用另一个 cte 来吸引老板。
另一个想法是为两个公司所有者设立一个虚拟的老板,这样他们就不会成为彼此的老板,这是荒谬的。 这是我的首选。
basicaly if you have loops like this in data you'll have to do the retreival logic by yourself.
you could use one cte to get only subordinates and other to get bosses.
another idea is to have a dummy row as a boss to both company owners so they wouldn't be each others bosses which is ridiculous. this is my prefferd option.
我知道您不久前问过这个问题,但这里有一个解决方案可以用于检测无限递归循环。 我生成一个路径,并在 CTE 条件中检查用户 ID 是否在路径中,如果是,则不会再次处理它。 希望这可以帮助。
何塞
I know you asked this question a while ago, but here is a solution that may work for detecting infinite recursive loops. I generate a path and I checked in the CTE condition if the USER ID is in the path, and if it is it wont process it again. Hope this helps.
Jose
您不必递归地执行此操作。 它可以在 WHILE 循环中完成。 我保证它会更快:每次我对这两种技术进行计时时,这对我来说都是如此。 这听起来效率低下,但事实并非如此,因为循环次数就是递归级别。 在每次迭代中,您都可以检查循环并纠正发生循环的位置。 您还可以对临时表施加约束,以便在发生循环时引发错误,尽管您似乎更喜欢更优雅地处理循环的方法。 当 while 循环迭代一定数量的级别时,您还可以触发错误(以捕获未检测到的循环? - 哦,天哪,有时会发生这种情况。
诀窍是重复插入到临时表中(该表由根条目填充) ),包括具有当前迭代编号的列,并在临时表中的最新结果和原始表中的子条目之间进行内连接,只需在 @@rowcount=0 时跳出循环即可!
简单吧?
You don't have to do it recursively. It can be done in a WHILE loop. I guarantee it will be quicker: well it has been for me every time I've done timings on the two techniques. This sounds inefficient but it isn't since the number of loops is the recursion level. At each iteration you can check for looping and correct where it happens. You can also put a constraint on the temporary table to fire an error if looping occurs, though you seem to prefer something that deals with looping more elegantly. You can also trigger an error when the while loop iterates over a certain number of levels (to catch an undetected loop? - oh boy, it sometimes happens.
The trick is to insert repeatedly into a temporary table (which is primed with the root entries), including a column with the current iteration number, and doing an inner join between the most recent results in the temporary table and the child entries in the original table. Just break out of the loop when @@rowcount=0!
Simple eh?
不是通用的解决方案,但可能适合您的情况:在您的选择查询中将其修改
为:
Not a generic solution, but might work for your case: in your select query modify this:
to become:
这适用于初始递归链接,但可能不适用于较长的链接,
我建议使用 while 循环,并且仅在 id 不存在时才将链接插入到临时表中,从而消除无限循环。
this will work for the initial recursive link, but might not work for longer links
what i would recomend is to use a while loop, and only insert links into temp table if the id does not already exist, thus removing endless loops.
我知道已经有一段时间了,但我认为我应该分享我的经验,因为我尝试了每一个解决方案,这里是我的发现摘要(也许是这篇文章?):
完成所有这些后,我想到了将[合格]员工的整个子集转储到代码(C#)中并使用递归方法对其进行过滤的想法。 然后,我将筛选后的员工列表写入数据表,并将其作为临时表导出到我的存储过程中。 令我难以置信的是,事实证明,对于小型表和相对较大的表(我尝试了最多 35,000 行的表)来说,这都是最快、最灵活的方法。
I know it has been a while but thought I should share my experience as I tried every single solution and here is a summary of my findings (an maybe this post?):
Having done all of that, I came up with the idea of dumping the whole subset of [eligible] employees to code (C#) and filter them there using a recursive method. Then I wrote the filtered list of employees to a datatable and export it to my stored procedure as a temp table. To my disbelief, this proved to be the fastest and most flexible method for both small and relatively large tables (I tried tables of up to 35,000 rows).