如何创建一个存储过程来查找用户之间的联系表中的派系

发布于 2024-10-01 12:27:54 字数 2742 浏览 3 评论 0原文

在寻找一种从大型数据集中检索社区的方法时，我发现了一篇关于该算法的文章，该算法似乎适合大型数据集。无论如何，数据存储在两个表中：用户（节点）和连接，我想通过纯 SQL 查询来检索社区，而不需要自定义应用程序的帮助（我正在使用 SQL Server 2008）。

检索 cliques 的算法如下：

Read the graph G
Generate set neighbors(v) for every vertex of G
for each vertex v of G
call recursive_find_cliques(v, neighbors(v))
end for

Function recursive_find_cliques(x, n)
for each vertex t ∈ n by ascending order calculate set sigma
if sigma is not empty
extend x with t
call recursive_find_cliques(x, sigma)
end if
end for

其中 sigma 是可以构成三角形的顶点集与 v 及其邻居。

我已经创建了一个存储过程，它返回所选节点的邻居表，但到目前为止我还没有使用 sql 函数和高级查询，所以问题如下：

有谁知道如何重写上面的sql算法为了得到派系的集合？正如问题可能有点抽象，我可能指出主要问题是创建一个递归函数 (recursive_find_cliques(x, n)) 其中以表 (n) 作为参数）。

谢谢你！

编辑：

这是迄今为止创建的存储过程：

CREATE PROCEDURE [dbo].[Peamc_Test]
AS
BEGIN

SET XACT_ABORT ON
BEGIN TRAN

SET NOCOUNT ON;

CREATE TABLE #Users
(
UserId int NOT NULL,
userLabel varchar(50) PRIMARY KEY NOT NULL,
Observed bit NOT NULL
)

CREATE TABLE #Neighbors
(
UserId int NOT NULL,
userLabel varchar(50) NOT NULL PRIMARY KEY,
Retrieved bit NOT NULL
)

CREATE TABLE #ConnectedVertices
(
UserId int NOT NULL,
userLabel varchar(50) NOT NULL PRIMARY KEY,
)

CREATE TABLE #Cliques
(
CliqueId int NOT NULL,
UserId varchar(50) NOT NULL,
)

DECLARE @UsersCount int
DECLARE @ii int
DECLARE @User varchar(50)
DECLARE @NeighborsCount int

INSERT INTO #Users(UserId, userLabel, Observed) SELECT user_id, userLabel, 0 FROM dbo.test_users WHERE user_id IS NOT NULL
SELECT @UsersCount = COUNT(*) FROM #Users
SELECT @ii = 1
WHILE @ii <= @UsersCount
BEGIN
--select user
SELECT TOP 1 @User = userLabel FROM #Users WHERE Observed = 0 ORDER BY UserId

UPDATE #Users SET Observed = 1 WHERE userLabel = @User

--Get user's neighbors
DELETE FROM #Neighbors
INSERT INTO #Neighbors(UserId, userLabel, Retrieved)
SELECT u.user_id, t2.neighbor, 0 FROM ( SELECT CALLING_NEIGHBORS.neighbor FROM ( SELECT mc.calling_party AS neighbor FROM monthly_connections_test mc WHERE mc.called_party = @User) AS CALLING_NEIGHBORS INNER JOIN (SELECT mc.called_party AS neighbor FROM monthly_connections_test mc WHERE mc.calling_party = @User) AS CALLED_NEIGHBORS ON CALLING_NEIGHBORS.neighbor = CALLED_NEIGHBORS.neighbor) AS t2 INNER JOIN test_users u ON t2.neighbor = u.userLabel
SELECT @NeighborsCount = COUNT(*) FROM #Neighbors
SELECT @ii = @ii + 1
--HERE the function recursive_find_cliques has to search for cliques and insert the found ones in #cliques

END

SELECT * FROM #Cliques

END

它尚未返回任何内容，因为它尚未完成。它会检索当前选定节点的所有邻居，下一步是实现 recursive_find_cliques 函数。

原文

Loooking for a way to retrieve community from a large dataset I came across an article about the algorithm which seems to be apropriate for large datasets. Anyway the data is stored two tables: users (nodes) and connections and I would like to retrieve the communities by pure sql queries without help of custom applications (I'm using SQL Server 2008).

The algorithm to retrieve the cliques is the following:

Read the graph G
Generate set neighbors(v) for every vertex of G
for each vertex v of G
call recursive_find_cliques(v, neighbors(v))
end for

Function recursive_find_cliques(x, n)
for each vertex t ∈ n by ascending order calculate set sigma
if sigma is not empty
extend x with t
call recursive_find_cliques(x, sigma)
end if
end for

where sigma is the set of vertices that could constitute triangles with v and its neighbors.

I already created a stored procedure which returns a table of neighbors of selected node but so far I haven't delat with sql functions and advanced queries so the question is the following:

Does anyone know how to rewrite the
algorithm above in sql in order to get
the set of cliques? As the question
might be a little abstract, I may
point out that the main problem is to
create a recursive function
(recursive_find_cliques(x, n)) which
takes a table (n) as an argument).

Thank you!

EDIT:

Here is sthe stored procedure created so far:

CREATE PROCEDURE [dbo].[Peamc_Test]
AS
BEGIN

SET XACT_ABORT ON
BEGIN TRAN

SET NOCOUNT ON;

CREATE TABLE #Users
(
UserId int NOT NULL,
userLabel varchar(50) PRIMARY KEY NOT NULL,
Observed bit NOT NULL
)

CREATE TABLE #Neighbors
(
UserId int NOT NULL,
userLabel varchar(50) NOT NULL PRIMARY KEY,
Retrieved bit NOT NULL
)

CREATE TABLE #ConnectedVertices
(
UserId int NOT NULL,
userLabel varchar(50) NOT NULL PRIMARY KEY,
)

CREATE TABLE #Cliques
(
CliqueId int NOT NULL,
UserId varchar(50) NOT NULL,
)

DECLARE @UsersCount int
DECLARE @ii int
DECLARE @User varchar(50)
DECLARE @NeighborsCount int

INSERT INTO #Users(UserId, userLabel, Observed) SELECT user_id, userLabel, 0 FROM dbo.test_users WHERE user_id IS NOT NULL
SELECT @UsersCount = COUNT(*) FROM #Users
SELECT @ii = 1
WHILE @ii <= @UsersCount
BEGIN
--select user
SELECT TOP 1 @User = userLabel FROM #Users WHERE Observed = 0 ORDER BY UserId

UPDATE #Users SET Observed = 1 WHERE userLabel = @User

--Get user's neighbors
DELETE FROM #Neighbors
INSERT INTO #Neighbors(UserId, userLabel, Retrieved)
SELECT u.user_id, t2.neighbor, 0 FROM ( SELECT CALLING_NEIGHBORS.neighbor FROM ( SELECT mc.calling_party AS neighbor FROM monthly_connections_test mc WHERE mc.called_party = @User) AS CALLING_NEIGHBORS INNER JOIN (SELECT mc.called_party AS neighbor FROM monthly_connections_test mc WHERE mc.calling_party = @User) AS CALLED_NEIGHBORS ON CALLING_NEIGHBORS.neighbor = CALLED_NEIGHBORS.neighbor) AS t2 INNER JOIN test_users u ON t2.neighbor = u.userLabel
SELECT @NeighborsCount = COUNT(*) FROM #Neighbors
SELECT @ii = @ii + 1
--HERE the function recursive_find_cliques has to search for cliques and insert the found ones in #cliques

END

SELECT * FROM #Cliques

END

It does'not return anything yet as it is not finished. It though retrieves all neighbors for the currently selected nodes and the next step is to implement recursive_find_cliques function.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

美男兮 2024-10-08 12:27:54

我意识到我的第一个答案仅在每个派系至少有一个用户未被该派系中的任何其他人引用时才有效。换句话说，像AB、BC、CA这样的封闭派系是不会被发现的。

这是解决这个问题的解决方案。我们再次拥有具有 ID 的用户，现在为 1..20。需要处理相邻关系的情况有以下几种：

alt text

相比简单的情况，比较难找每个派系都有一个独特的开胃菜。
我们通过一点技巧来实现这一点：

重新排序邻居，以便对于所有引用 AB，A 小于 B，忽略任何 A=B。
如果存在任何 XA，请删除所有 AX 引用，否则可能会导致循环。这永远不会完全删除对 A 的引用，因为 XA 仍然存在，并且 AX 将添加到递归中。

结果集是“起始”用户，我们使用它们来启动 CTE：

-- Get all pairs, where UserA < UserB, dropping any A=B or B=A
WITH LRNeighbours(A, B) AS (
    SELECT
        Neighbours.UserA, Neighbours.UserB
    FROM
        Neighbours
    WHERE
        Neighbours.UserA < Neighbours.UserB
UNION ALL
    SELECT DISTINCT
        Neighbours.UserB, Neighbours.UserA
    FROM
        Neighbours
    WHERE
        Neighbours.UserA > Neighbours.UserB
),
-- Isolate those that are not referred to by a higher numbered key
Starters(userid) AS (
    SELECT DISTINCT
        A
    FROM    
        LRNeighbours
    WHERE 
        A NOT IN (
            SELECT 
                B
            FROM
                LRNeighbours
        )
),
-- The recursive Common Table Expression
cliques(userid, clique) AS (
    -- Number starters 1..N
    SELECT  
        userid, ROW_NUMBER() OVER(ORDER BY userid) AS clique
    FROM
        Starters
UNION ALL
    -- Recurse, adding users referred by siblings, avoiding starters themselves
    SELECT  
        B, clique 
    FROM
        LRNeighbours INNER JOIN 
        cliques ON
            LRNeighbours.A = cliques.userid 
            AND B NOT IN (
                SELECT
                    userid
                FROM
                    starters
            )
)
SELECT DISTINCT
    clique, userid 
FROM 
    cliques 
ORDER BY 
    clique, userid

结果：

I realised that my first answer only works when each clique has at least one user who is not referred to by any others in that clique. In other words, closed cliques like A-B, B-C, C-A will not be found.

Here is a solution which solves this. Again we have users with IDs, now 1..20. There are several cases of neighbouring relations that need to be handled:

alt text

Compared to the simple case, it is harder to find a unique starter for each clique.
We achieve this with a little sleight of hand:

Reorder the neighbours so that for all references A-B, A is less than B, ignoring any A=B.
From these, remove any A-X references if there are any X-A, which could cause a loop. This will never remove references to A completely because X-A remains and A-X will be added in the recursion.

The resultant set are the 'starting' users and we use them to prime the CTE:

-- Get all pairs, where UserA < UserB, dropping any A=B or B=A
WITH LRNeighbours(A, B) AS (
    SELECT
        Neighbours.UserA, Neighbours.UserB
    FROM
        Neighbours
    WHERE
        Neighbours.UserA < Neighbours.UserB
UNION ALL
    SELECT DISTINCT
        Neighbours.UserB, Neighbours.UserA
    FROM
        Neighbours
    WHERE
        Neighbours.UserA > Neighbours.UserB
),
-- Isolate those that are not referred to by a higher numbered key
Starters(userid) AS (
    SELECT DISTINCT
        A
    FROM    
        LRNeighbours
    WHERE 
        A NOT IN (
            SELECT 
                B
            FROM
                LRNeighbours
        )
),
-- The recursive Common Table Expression
cliques(userid, clique) AS (
    -- Number starters 1..N
    SELECT  
        userid, ROW_NUMBER() OVER(ORDER BY userid) AS clique
    FROM
        Starters
UNION ALL
    -- Recurse, adding users referred by siblings, avoiding starters themselves
    SELECT  
        B, clique 
    FROM
        LRNeighbours INNER JOIN 
        cliques ON
            LRNeighbours.A = cliques.userid 
            AND B NOT IN (
                SELECT
                    userid
                FROM
                    starters
            )
)
SELECT DISTINCT
    clique, userid 
FROM 
    cliques 
ORDER BY 
    clique, userid

Results:

回复收藏 0 原文

西瓜 2024-10-08 12:27:54

CREATE TABLE [dbo].[Users](
    [UserID] [int] IDENTITY(1,1) NOT NULL,
    [UserName] [varchar](50) NOT NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[Neighbours](
    [UserA] [int] NOT NULL,
    [UserB] [int] NOT NULL
) ON [PRIMARY]

用户填充 1..8 和 Neighbors

UserA   UserB
1   2
2   3
4   5
4   6
5   7
7   8

然后：

WITH cliques(userid, clique) AS (
    SELECT  
        userid, ROW_NUMBER() OVER(ORDER BY userid) AS clique
    FROM
        Users
    WHERE
        users.UserID NOT IN (
            SELECT 
                UserB
            FROM
                Neighbours
        )
UNION ALL
    SELECT  
        Neighbours.UserB, clique 
    FROM
        neighbours 
        INNER JOIN cliques
            ON Neighbours.UserA = cliques.userid
)
SELECT
    clique, cliques.userid 
FROM 
    cliques
ORDER BY 
    clique, userid

结果：

clique  userid
1   1
1   2
1   3
2   4
2   5
2   6
2   7
2   8

请参阅：使用 Common 的递归查询表表达式

CREATE TABLE [dbo].[Users](
    [UserID] [int] IDENTITY(1,1) NOT NULL,
    [UserName] [varchar](50) NOT NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[Neighbours](
    [UserA] [int] NOT NULL,
    [UserB] [int] NOT NULL
) ON [PRIMARY]

Users populated with 1..8 and Neighbours

UserA   UserB
1   2
2   3
4   5
4   6
5   7
7   8

Then:

WITH cliques(userid, clique) AS (
    SELECT  
        userid, ROW_NUMBER() OVER(ORDER BY userid) AS clique
    FROM
        Users
    WHERE
        users.UserID NOT IN (
            SELECT 
                UserB
            FROM
                Neighbours
        )
UNION ALL
    SELECT  
        Neighbours.UserB, clique 
    FROM
        neighbours 
        INNER JOIN cliques
            ON Neighbours.UserA = cliques.userid
)
SELECT
    clique, cliques.userid 
FROM 
    cliques
ORDER BY 
    clique, userid

Result:

clique  userid
1   1
1   2
1   3
2   4
2   5
2   6
2   7
2   8

See : Recursive Queries Using Common Table Expressions

回复收藏 0 原文

别闹i 2024-10-08 12:27:54

我添加了两个 LABELS 和两个 GOTO 语句

CREATE PROCEDURE [dbo].[Peamc_Test] 
AS 
BEGIN 

SET XACT_ABORT ON 
BEGIN TRAN 

SET NOCOUNT ON; 

CREATE TABLE #Users 
( 
UserId int NOT NULL, 
userLabel varchar(50) PRIMARY KEY NOT NULL, 
Observed bit NOT NULL 
) 

CREATE TABLE #Neighbors 
( 
UserId int NOT NULL, 
userLabel varchar(50) NOT NULL PRIMARY KEY, 
Retrieved bit NOT NULL 
) 

CREATE TABLE #ConnectedVertices 
( 
UserId int NOT NULL, 
userLabel varchar(50) NOT NULL PRIMARY KEY, 
) 

CREATE TABLE #Cliques 
( 
CliqueId int NOT NULL, 
UserId varchar(50) NOT NULL, 
) 

DECLARE @UsersCount int 
DECLARE @ii int 
DECLARE @User varchar(50) 
DECLARE @NeighborsCount int 

INSERT INTO #Users(UserId, userLabel, Observed) SELECT user_id, userLabel, 0 FROM dbo.test_users WHERE user_id IS NOT NULL 
SELECT @UsersCount = COUNT(*) FROM #Users 
SELECT @ii = 1 
WHILE @ii <= @UsersCount 
BEGIN 
--select user 
SELECT TOP 1 @User = userLabel FROM #Users WHERE Observed = 0 ORDER BY UserId 

UPDATE #Users SET Observed = 1 WHERE userLabel = @User 

--Get user's neighbors 
DELETE FROM #Neighbors 
INSERT INTO #Neighbors(UserId, userLabel, Retrieved) 
SELECT u.user_id, t2.neighbor, 0 FROM ( SELECT CALLING_NEIGHBORS.neighbor FROM ( SELECT mc.calling_party AS neighbor FROM monthly_connections_test mc WHERE mc.called_party = @User) AS CALLING_NEIGHBORS INNER JOIN (SELECT mc.called_party AS neighbor FROM monthly_connections_test mc WHERE mc.calling_party = @User) AS CALLED_NEIGHBORS ON CALLING_NEIGHBORS.neighbor = CALLED_NEIGHBORS.neighbor) AS t2 INNER JOIN test_users u ON t2.neighbor = u.userLabel 
SELECT @NeighborsCount = COUNT(*) FROM #Neighbors 
SELECT @ii = @ii + 1 
GOTO Clique_Find
--HERE the function recursive_find_cliques has to search for cliques and insert the found ones in #cliques 
--------------------
Clique_Return:
--------------------

END 

SELECT * FROM #Cliques 

END 

--------------------
Clique_Find:
--------------------
-- Code goes here
-- Code goes here
-- Code goes here
-- Code goes here
-- Code goes here
-- Code goes here
GOTO Clique_Return

I've added a two LABELS and two GOTO statements

CREATE PROCEDURE [dbo].[Peamc_Test] 
AS 
BEGIN 

SET XACT_ABORT ON 
BEGIN TRAN 

SET NOCOUNT ON; 

CREATE TABLE #Users 
( 
UserId int NOT NULL, 
userLabel varchar(50) PRIMARY KEY NOT NULL, 
Observed bit NOT NULL 
) 

CREATE TABLE #Neighbors 
( 
UserId int NOT NULL, 
userLabel varchar(50) NOT NULL PRIMARY KEY, 
Retrieved bit NOT NULL 
) 

CREATE TABLE #ConnectedVertices 
( 
UserId int NOT NULL, 
userLabel varchar(50) NOT NULL PRIMARY KEY, 
) 

CREATE TABLE #Cliques 
( 
CliqueId int NOT NULL, 
UserId varchar(50) NOT NULL, 
) 

DECLARE @UsersCount int 
DECLARE @ii int 
DECLARE @User varchar(50) 
DECLARE @NeighborsCount int 

INSERT INTO #Users(UserId, userLabel, Observed) SELECT user_id, userLabel, 0 FROM dbo.test_users WHERE user_id IS NOT NULL 
SELECT @UsersCount = COUNT(*) FROM #Users 
SELECT @ii = 1 
WHILE @ii <= @UsersCount 
BEGIN 
--select user 
SELECT TOP 1 @User = userLabel FROM #Users WHERE Observed = 0 ORDER BY UserId 

UPDATE #Users SET Observed = 1 WHERE userLabel = @User 

--Get user's neighbors 
DELETE FROM #Neighbors 
INSERT INTO #Neighbors(UserId, userLabel, Retrieved) 
SELECT u.user_id, t2.neighbor, 0 FROM ( SELECT CALLING_NEIGHBORS.neighbor FROM ( SELECT mc.calling_party AS neighbor FROM monthly_connections_test mc WHERE mc.called_party = @User) AS CALLING_NEIGHBORS INNER JOIN (SELECT mc.called_party AS neighbor FROM monthly_connections_test mc WHERE mc.calling_party = @User) AS CALLED_NEIGHBORS ON CALLING_NEIGHBORS.neighbor = CALLED_NEIGHBORS.neighbor) AS t2 INNER JOIN test_users u ON t2.neighbor = u.userLabel 
SELECT @NeighborsCount = COUNT(*) FROM #Neighbors 
SELECT @ii = @ii + 1 
GOTO Clique_Find
--HERE the function recursive_find_cliques has to search for cliques and insert the found ones in #cliques 
--------------------
Clique_Return:
--------------------

END 

SELECT * FROM #Cliques 

END 

--------------------
Clique_Find:
--------------------
-- Code goes here
-- Code goes here
-- Code goes here
-- Code goes here
-- Code goes here
-- Code goes here
GOTO Clique_Return

回复收藏 0 原文

~没有更多了~