如何在 RDBMS 中存储和搜索序列?

发布于 2024-10-20 08:51:33 字数 406 浏览 5 评论 0 原文

我需要在数据库(SQL Server 2008)中存储一些序列/列表,然后从数据库中查找特定序列的 ID(如果存在)。

例如,我有两个序列:

序列 1:A、B、C 序列2:A、C、M、N

目前它们存储在下表中。 (如果可以让事情变得更容易,我可以更改表。)

seq_id   token   order
1        A       0
1        B       1
1        C       2
2        A       0
2        C       1
2        M       2
2        N       3

我想编写一个查询来返回给定序列的 id,例如“A,B,C”(如果存在精确匹配)。序列的长度事先是未知的。谢谢你!

I need to store some sequences/lists in the database (SQL Server 2008), and then find the ID for a particular sequence from the DB if any exists.

For example, I have two sequences:

Sequence 1: A,B,C
Sequence 2: A,C,M,N

Currently they are stored in the following table. (I am OK with changing the table if it makes things easier.)

seq_id   token   order
1        A       0
1        B       1
1        C       2
2        A       0
2        C       1
2        M       2
2        N       3

I'd like to write a query to return the id of a given sequence, e.g. "A,B,C", if there is an exact match. The length of the sequence is unknown beforehand. Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

可是我不能没有你 2024-10-27 08:51:33

您需要的是所谓的关系划分(请参阅Celko)。最佳解决方案取决于您的 RDB 引擎。如果您能够这样做 - 最流行的解决方案是:

  1. 将您的查询表示为一个表(A、B、C 的表)
  2. 将您的表与现有表进行内连接,按 seq_id 进行分组,对组中的元素进行计数
  3. 使用count 来过滤掉不精确的序列(即,当查找 A、B、C 时,计数必须为 3)

假设您有一个 #query 表,其中包含您希望查找的标记和排序(我使用 sort 而不是 order 来避免与保留关键字冲突)

create table #query
(
token nvarchar(1)
,sort int
)

insert into #query select 'A',0
insert into #query select 'B',1
insert into #query select 'C',2
go

select  seq_id
from    dbo.sequences s
inner join  #query q
    on  q.token = s.token
        and q.sort = s.sort
group by s.seq_id
having count(*) = (select count(*) from #query)

将返回与您的查询匹配的 seq_id。在较新版本的 MsSql 中,人们将使用表变量而不是 #query,但该技术可以普遍应用。

What you need is called relational division (see Celko). The best solution will depend on your rdb engine. If you are able to do so - the most popular solution would be:

  1. Express you query as a table (a table of A,B,C)
  2. Inner join your table to the existing table, group by seq_id, count the elements of groups
  3. Use count to filter out sequences that are not exact (ie. when looking for A,B,C the count must be 3)

Let's say you have a #query table holding tokens and sorts you wish to find (I use sort instead of order to avoid conflicts with reserved keywords)

create table #query
(
token nvarchar(1)
,sort int
)

insert into #query select 'A',0
insert into #query select 'B',1
insert into #query select 'C',2
go

select  seq_id
from    dbo.sequences s
inner join  #query q
    on  q.token = s.token
        and q.sort = s.sort
group by s.seq_id
having count(*) = (select count(*) from #query)

Will return seq_id(s) that match your query. In newer versions of MsSql one would use a table variable instead of #query but the technique can be applied universally.

放我走吧 2024-10-27 08:51:33

您需要的是每个序列的特定于订单的签名。使用 SQL Server 2008,您可以使用 For Xml Path 构造为每个序列和条件组合签名,然后将其与另一个进行比较。显然,这不会很快。您可以在保存每个序列或更改其成员资格时将签名存储到每个序列的相应表中,从而极大地提高速度(也可以使用触发器)。另外,我这里只是使用了原始签名。但是,通常情况下,我倾向于使用 Hashbytes 函数对签名使用组装值的哈希值。

Declare @TestInputs Table   (
                            seq_id int not null
                            , token char(1) not null
                            , [order] int not null
                            )
Insert @TestInputs( seq_id, token, [order] )
Values (1,'A',0)
    , (1,'B',1)
    , (1,'C',2)
    , (2,'A',0)
    , (2,'C',1)
    , (2,'M',2)
    , (2,'N',3);

Declare @Criteria Table (
                        token char(1) not null
                        , [order] int not null
                        )   
Insert @Criteria( token, [order] )
Values ('A',0)
    , ('B',1)
    , ('C',2);  

With Criteria As
    (
    Select  (
            Select '|' + Token
            From @Criteria
            Order By [order]
            For Xml Path('')
            ) As Signature
    )
    , InputSignatures As
    (
    Select T.seq_id
        ,   (
            Select '|' + T1.Token
            From @TestInputs As T1
            Where T1.seq_id = T.seq_id
            Order By T1.[order]
            For Xml Path('')
            ) As Signature
    From @TestInputs As T
    Group By T.seq_id
    )
Select I.*
From InputSignatures As I
    Join Criteria As C  
        On C.Signature = I.Signature

What you need is an order-specific signature for each sequence. Using SQL Server 2008, you can use the For Xml Path construct to assemble a signature for each sequence and the criteria and then compare one to the other. Obviously, this will not be fast. You can improve the speed tremendously by storing the signature into the appropriate table for each sequence at the time they are saved or their membership is changed (could also use a trigger). In addition, I simply used the raw signature here. However, normally, I'd be inclined to use a hash of the assembled value for the signature using the Hashbytes function.

Declare @TestInputs Table   (
                            seq_id int not null
                            , token char(1) not null
                            , [order] int not null
                            )
Insert @TestInputs( seq_id, token, [order] )
Values (1,'A',0)
    , (1,'B',1)
    , (1,'C',2)
    , (2,'A',0)
    , (2,'C',1)
    , (2,'M',2)
    , (2,'N',3);

Declare @Criteria Table (
                        token char(1) not null
                        , [order] int not null
                        )   
Insert @Criteria( token, [order] )
Values ('A',0)
    , ('B',1)
    , ('C',2);  

With Criteria As
    (
    Select  (
            Select '|' + Token
            From @Criteria
            Order By [order]
            For Xml Path('')
            ) As Signature
    )
    , InputSignatures As
    (
    Select T.seq_id
        ,   (
            Select '|' + T1.Token
            From @TestInputs As T1
            Where T1.seq_id = T.seq_id
            Order By T1.[order]
            For Xml Path('')
            ) As Signature
    From @TestInputs As T
    Group By T.seq_id
    )
Select I.*
From InputSignatures As I
    Join Criteria As C  
        On C.Signature = I.Signature
稍尽春風 2024-10-27 08:51:33

为什么不按原样存储:

seq_id   token
1        A,B,C       
2        A,C,M,N

查询变得微不足道

Why not to store it as is:

seq_id   token
1        A,B,C       
2        A,C,M,N

Query becomes trivial

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文