如何创建一个存储过程来查找用户之间的联系表中的派系
在寻找一种从大型数据集中检索社区的方法时,我发现了一篇关于该算法的文章,该算法似乎适合大型数据集。无论如何,数据存储在两个表中:用户(节点)和连接,我想通过纯 SQL 查询来检索社区,而不需要自定义应用程序的帮助(我正在使用 SQL Server 2008)。
检索 cliques 的算法如下:
Read the graph G
Generate set neighbors(v) for every vertex of G
for each vertex v of G
call recursive_find_cliques(v, neighbors(v))
end for
Function recursive_find_cliques(x, n)
for each vertex t ∈ n by ascending order calculate set sigma
if sigma is not empty
extend x with t
call recursive_find_cliques(x, sigma)
end if
end for
其中 sigma 是可以构成三角形的顶点集与 v 及其邻居。
我已经创建了一个存储过程,它返回所选节点的邻居表,但到目前为止我还没有使用 sql 函数和高级查询,所以问题如下:
有谁知道如何重写 上面的sql算法为了得到 派系的集合?正如问题 可能有点抽象,我可能 指出主要问题是 创建一个递归函数 (recursive_find_cliques(x, n)) 其中 以表 (n) 作为参数)。
谢谢你!
编辑:
这是迄今为止创建的存储过程:
CREATE PROCEDURE [dbo].[Peamc_Test]
AS
BEGIN
SET XACT_ABORT ON
BEGIN TRAN
SET NOCOUNT ON;
CREATE TABLE #Users
(
UserId int NOT NULL,
userLabel varchar(50) PRIMARY KEY NOT NULL,
Observed bit NOT NULL
)
CREATE TABLE #Neighbors
(
UserId int NOT NULL,
userLabel varchar(50) NOT NULL PRIMARY KEY,
Retrieved bit NOT NULL
)
CREATE TABLE #ConnectedVertices
(
UserId int NOT NULL,
userLabel varchar(50) NOT NULL PRIMARY KEY,
)
CREATE TABLE #Cliques
(
CliqueId int NOT NULL,
UserId varchar(50) NOT NULL,
)
DECLARE @UsersCount int
DECLARE @ii int
DECLARE @User varchar(50)
DECLARE @NeighborsCount int
INSERT INTO #Users(UserId, userLabel, Observed) SELECT user_id, userLabel, 0 FROM dbo.test_users WHERE user_id IS NOT NULL
SELECT @UsersCount = COUNT(*) FROM #Users
SELECT @ii = 1
WHILE @ii <= @UsersCount
BEGIN
--select user
SELECT TOP 1 @User = userLabel FROM #Users WHERE Observed = 0 ORDER BY UserId
UPDATE #Users SET Observed = 1 WHERE userLabel = @User
--Get user's neighbors
DELETE FROM #Neighbors
INSERT INTO #Neighbors(UserId, userLabel, Retrieved)
SELECT u.user_id, t2.neighbor, 0 FROM ( SELECT CALLING_NEIGHBORS.neighbor FROM ( SELECT mc.calling_party AS neighbor FROM monthly_connections_test mc WHERE mc.called_party = @User) AS CALLING_NEIGHBORS INNER JOIN (SELECT mc.called_party AS neighbor FROM monthly_connections_test mc WHERE mc.calling_party = @User) AS CALLED_NEIGHBORS ON CALLING_NEIGHBORS.neighbor = CALLED_NEIGHBORS.neighbor) AS t2 INNER JOIN test_users u ON t2.neighbor = u.userLabel
SELECT @NeighborsCount = COUNT(*) FROM #Neighbors
SELECT @ii = @ii + 1
--HERE the function recursive_find_cliques has to search for cliques and insert the found ones in #cliques
END
SELECT * FROM #Cliques
END
它尚未返回任何内容,因为它尚未完成。它会检索当前选定节点的所有邻居,下一步是实现 recursive_find_cliques 函数。
Loooking for a way to retrieve community from a large dataset I came across an article about the algorithm which seems to be apropriate for large datasets. Anyway the data is stored two tables: users (nodes) and connections and I would like to retrieve the communities by pure sql queries without help of custom applications (I'm using SQL Server 2008).
The algorithm to retrieve the cliques is the following:
Read the graph G
Generate set neighbors(v) for every vertex of G
for each vertex v of G
call recursive_find_cliques(v, neighbors(v))
end for
Function recursive_find_cliques(x, n)
for each vertex t ∈ n by ascending order calculate set sigma
if sigma is not empty
extend x with t
call recursive_find_cliques(x, sigma)
end if
end for
where sigma is the set of vertices that could constitute triangles with v and its neighbors.
I already created a stored procedure which returns a table of neighbors of selected node but so far I haven't delat with sql functions and advanced queries so the question is the following:
Does anyone know how to rewrite the
algorithm above in sql in order to get
the set of cliques? As the question
might be a little abstract, I may
point out that the main problem is to
create a recursive function
(recursive_find_cliques(x, n)) which
takes a table (n) as an argument).
Thank you!
EDIT:
Here is sthe stored procedure created so far:
CREATE PROCEDURE [dbo].[Peamc_Test]
AS
BEGIN
SET XACT_ABORT ON
BEGIN TRAN
SET NOCOUNT ON;
CREATE TABLE #Users
(
UserId int NOT NULL,
userLabel varchar(50) PRIMARY KEY NOT NULL,
Observed bit NOT NULL
)
CREATE TABLE #Neighbors
(
UserId int NOT NULL,
userLabel varchar(50) NOT NULL PRIMARY KEY,
Retrieved bit NOT NULL
)
CREATE TABLE #ConnectedVertices
(
UserId int NOT NULL,
userLabel varchar(50) NOT NULL PRIMARY KEY,
)
CREATE TABLE #Cliques
(
CliqueId int NOT NULL,
UserId varchar(50) NOT NULL,
)
DECLARE @UsersCount int
DECLARE @ii int
DECLARE @User varchar(50)
DECLARE @NeighborsCount int
INSERT INTO #Users(UserId, userLabel, Observed) SELECT user_id, userLabel, 0 FROM dbo.test_users WHERE user_id IS NOT NULL
SELECT @UsersCount = COUNT(*) FROM #Users
SELECT @ii = 1
WHILE @ii <= @UsersCount
BEGIN
--select user
SELECT TOP 1 @User = userLabel FROM #Users WHERE Observed = 0 ORDER BY UserId
UPDATE #Users SET Observed = 1 WHERE userLabel = @User
--Get user's neighbors
DELETE FROM #Neighbors
INSERT INTO #Neighbors(UserId, userLabel, Retrieved)
SELECT u.user_id, t2.neighbor, 0 FROM ( SELECT CALLING_NEIGHBORS.neighbor FROM ( SELECT mc.calling_party AS neighbor FROM monthly_connections_test mc WHERE mc.called_party = @User) AS CALLING_NEIGHBORS INNER JOIN (SELECT mc.called_party AS neighbor FROM monthly_connections_test mc WHERE mc.calling_party = @User) AS CALLED_NEIGHBORS ON CALLING_NEIGHBORS.neighbor = CALLED_NEIGHBORS.neighbor) AS t2 INNER JOIN test_users u ON t2.neighbor = u.userLabel
SELECT @NeighborsCount = COUNT(*) FROM #Neighbors
SELECT @ii = @ii + 1
--HERE the function recursive_find_cliques has to search for cliques and insert the found ones in #cliques
END
SELECT * FROM #Cliques
END
It does'not return anything yet as it is not finished. It though retrieves all neighbors for the currently selected nodes and the next step is to implement recursive_find_cliques function.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我意识到我的第一个答案仅在每个派系至少有一个用户未被该派系中的任何其他人引用时才有效。换句话说,像AB、BC、CA这样的封闭派系是不会被发现的。
这是解决这个问题的解决方案。我们再次拥有具有 ID 的用户,现在为 1..20。需要处理相邻关系的情况有以下几种:
相比简单的情况,比较难找每个派系都有一个独特的开胃菜。
我们通过一点技巧来实现这一点:
重新排序邻居,以便对于所有引用 AB,A 小于 B,忽略任何 A=B。
如果存在任何 XA,请删除所有 AX 引用,否则可能会导致循环。这永远不会完全删除对 A 的引用,因为 XA 仍然存在,并且 AX 将添加到递归中。
结果集是“起始”用户,我们使用它们来启动 CTE:
结果:
I realised that my first answer only works when each clique has at least one user who is not referred to by any others in that clique. In other words, closed cliques like A-B, B-C, C-A will not be found.
Here is a solution which solves this. Again we have users with IDs, now 1..20. There are several cases of neighbouring relations that need to be handled:
Compared to the simple case, it is harder to find a unique starter for each clique.
We achieve this with a little sleight of hand:
Reorder the neighbours so that for all references A-B, A is less than B, ignoring any A=B.
From these, remove any A-X references if there are any X-A, which could cause a loop. This will never remove references to A completely because X-A remains and A-X will be added in the recursion.
The resultant set are the 'starting' users and we use them to prime the CTE:
Results:
用户填充 1..8 和 Neighbors
然后:
结果:
请参阅:使用 Common 的递归查询表表达式
Users populated with 1..8 and Neighbours
Then:
Result:
See : Recursive Queries Using Common Table Expressions
我添加了两个 LABELS 和两个 GOTO 语句
I've added a two LABELS and two GOTO statements