实现“按顺序搜索”的数据结构

发布于 2024-08-21 01:37:58 字数 252 浏览 9 评论 0原文

我想知道我应该使用什么数据结构/存储策略来解决这个问题。

数据库中的每个数据条目都由多个有序项的列表组成,例如ABCD,其中A、B、C、D是不同的项。

假设我在数据库中有 3 个条目,

ABCD

EFG

GHBA

当用户输入一些无序条目时,我必须从数据库中找到匹配的有序条目。例如,如果用户输入A,B,G,H,我想从数据库返回GHBA给用户。

我的数据存储策略应该是什么?

I would like to know what data structure / storage strategy I should use for this problem.

Each data entry in the database consists of a list of multiple ordered items, such as A-B-C-D, where A, B, C, D are different items.

Suppose I have 3 entries in a database,

A-B-C-D

E-F-G

G-H-B-A

When the user entered some unordered items, I have to find the matching ordered entry(ies) from the database. For example, if user enters A,B,G,H, I want to return G-H-B-A from the database to the user.

What should be my data storage strategy?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不喜欢何必死缠烂打 2024-08-28 01:37:58

您最好分别存储有序元素和无序元素,否则您将需要搜索有序元素的所有排列,这将非常耗时。

试试这个:

/* Create a table to track your items (A, B, C, etc.). It contains all possible elements */
CREATE TABLE [Items](
    [Value] [char](1) NOT NULL,
 CONSTRAINT [PK_Items] PRIMARY KEY CLUSTERED ([Value]))

/* Create a table to track their grouping and stated ordering */
CREATE TABLE [Groups](
    [ID] [int] NOT NULL,
    [Order] [text] NOT NULL,
 CONSTRAINT [PK_Groups] PRIMARY KEY CLUSTERED ([ID]))

/* Create a mapping table to associate them */
CREATE TABLE [ItemsToGroups](
    [Item] [char](1) NOT NULL,
    [Group] [int] NOT NULL
)

ALTER TABLE [ItemsToGroups]  WITH CHECK ADD CONSTRAINT [FK_ItemsToGroups_Groups] FOREIGN KEY([Group])
REFERENCES [Groups] ([ID])

ALTER TABLE [ItemsToGroups] CHECK CONSTRAINT [FK_ItemsToGroups_Groups]

ALTER TABLE [ItemsToGroups]  WITH CHECK ADD CONSTRAINT [FK_ItemsToGroups_Items] FOREIGN KEY([Item])
REFERENCES [Items] ([Value])

ALTER TABLE [ItemsToGroups] CHECK CONSTRAINT [FK_ItemsToGroups_Items]

/* Populate your tables. 
   Items should have eight rows: A, B, C,...H
   Groups should have three rows: 1:ABCD, 2:EFG, 3:GHBA
   Items to groups should have eleven rows: A:1, B:1,...A:3 */

/* You will want to pass in a table of values, so set up a table-valued parameter
   First, create a type to support your input list */
CREATE TYPE ItemList AS TABLE (e char(1) NOT NULL PRIMARY KEY)
DECLARE @Input ItemList
GO

/* Create a stored procedure for your query */
CREATE PROCEDURE SelectOrderedGroup @Input ItemList READONLY AS
    SELECT *
    FROM Groups
    WHERE Groups.ID NOT IN (
        SELECT [Group]
        FROM ItemsToGroups
        WHERE Item NOT IN (SELECT e FROM @Input)
    )
GO

/* Now when you want to query them: */
DECLARE @MyList ItemList
INSERT @MyList(e) VALUES('G'),('H'),('B'),('A')
EXEC SelectOrderedGroup @MyList

上面将返回 3:GHBA,就像你想要的那样。如果您通过了 DCBA,您将返回 1:ABCD,再次像您正在寻找的那样。如果您传入 C,您将不会返回任何内容,因为没有组仅包含 C。

您可能需要使用 表值参数 用于您的输入,如上所示,但您可以将最终的 SELECT 转换为简单列表并删除 ItemList 类型。

You're best off storing the ordered and unordered elements separately, otherwise you'll need to search on all permutations of the ordered elements, which would be time consuming.

Try this:

/* Create a table to track your items (A, B, C, etc.). It contains all possible elements */
CREATE TABLE [Items](
    [Value] [char](1) NOT NULL,
 CONSTRAINT [PK_Items] PRIMARY KEY CLUSTERED ([Value]))

/* Create a table to track their grouping and stated ordering */
CREATE TABLE [Groups](
    [ID] [int] NOT NULL,
    [Order] [text] NOT NULL,
 CONSTRAINT [PK_Groups] PRIMARY KEY CLUSTERED ([ID]))

/* Create a mapping table to associate them */
CREATE TABLE [ItemsToGroups](
    [Item] [char](1) NOT NULL,
    [Group] [int] NOT NULL
)

ALTER TABLE [ItemsToGroups]  WITH CHECK ADD CONSTRAINT [FK_ItemsToGroups_Groups] FOREIGN KEY([Group])
REFERENCES [Groups] ([ID])

ALTER TABLE [ItemsToGroups] CHECK CONSTRAINT [FK_ItemsToGroups_Groups]

ALTER TABLE [ItemsToGroups]  WITH CHECK ADD CONSTRAINT [FK_ItemsToGroups_Items] FOREIGN KEY([Item])
REFERENCES [Items] ([Value])

ALTER TABLE [ItemsToGroups] CHECK CONSTRAINT [FK_ItemsToGroups_Items]

/* Populate your tables. 
   Items should have eight rows: A, B, C,...H
   Groups should have three rows: 1:ABCD, 2:EFG, 3:GHBA
   Items to groups should have eleven rows: A:1, B:1,...A:3 */

/* You will want to pass in a table of values, so set up a table-valued parameter
   First, create a type to support your input list */
CREATE TYPE ItemList AS TABLE (e char(1) NOT NULL PRIMARY KEY)
DECLARE @Input ItemList
GO

/* Create a stored procedure for your query */
CREATE PROCEDURE SelectOrderedGroup @Input ItemList READONLY AS
    SELECT *
    FROM Groups
    WHERE Groups.ID NOT IN (
        SELECT [Group]
        FROM ItemsToGroups
        WHERE Item NOT IN (SELECT e FROM @Input)
    )
GO

/* Now when you want to query them: */
DECLARE @MyList ItemList
INSERT @MyList(e) VALUES('G'),('H'),('B'),('A')
EXEC SelectOrderedGroup @MyList

The above will return 3:GHBA, like you want. If you pass in DCBA you'll get back 1:ABCD, again like you're looking for. If you pass in C, you'll get back nothing, as no group consists of just C.

You will probably want to use a table-valued parameter for your input, as shown above, but you could convert the final SELECT to a simple list and drop the ItemList type.

羞稚 2024-08-28 01:37:58

将列表拆分为单独的项目并在该级别上工作。

一些表:

列出

  • ID(PK)
  • 序列(上面的“ABCD”条目)
  • [其他]

项目

  • ID(PK)
  • 名称(值,单词,任何有意义的)
  • [其他]

list_items

  • list_ID
  • item_ID
  • [一个序数int,如果“ GHBA”和“ABGH”被认为是不同的序列]

(复合PK list_ID,item_ID [,序数],基本的多:多关系)

一些数据,所以更清楚表代表什么:

INSERT INTO items (ID, name) VALUES (1, 'A'), (2, 'B'), (3, 'G'), (4, 'H');
INSERT INTO lists (ID, sequence) VALUES (1, 'A-B-G-H');
INSERT INTO list_items (list_ID, item_ID) VALUES (1, 1), (1, 2), (1, 3), (1, 4);
INSERT INTO lists (ID, sequence) VALUES (2, 'B-A-G');
INSERT INTO list_items (list_ID, item_ID) VALUES (2, 2), (2, 1), (2, 3);

最后,找到列表包含所有项(A、B、G、H):

SELECT lists.sequence FROM lists
JOIN list_items ON lists.ID = list_items.list_ID
JOIN items AS i1 ON list_items.item_ID = i1.ID HAVING i1.name = 'A'
JOIN items AS i2 ON list_items.item_ID = i2.ID HAVING i2.name = 'B'
JOIN items AS i3 ON list_items.item_ID = i3.ID HAVING i3.name = 'G'
JOIN items AS i4 ON list_items.item_ID = i4.ID HAVING i4.name = 'H'

应该返回“ABGH”、“GHAB”、“HATBAG”等任何列表,但不返回“BUGHUT”(无 A)或“BATH”(无 G)- 必须满足所有条件。进行“任何”搜索可能会涉及更多一些(在午餐时在我的脑海中写下这个,但单独使用 RIGHT JOIN 可能会导致各种重复和缓慢)。

它不会绘制任何基因组图谱或重新定义人类语言,但对于规模相当大的数据集来说应该没问题。无论哪种方式,我都会避免将每个列表存储为 varchar 并执行“WHERE sequence LIKE '%A%' AND sequence LIKE '%B%'”的操作,除非您绝对无法处理额外的内容努力添加新数据。

Split the lists into individual items and work on that level.

Some tables:

lists

  • ID (PK)
  • sequence (the "A-B-C-D" entries above)
  • [whatever else]

items

  • ID (PK)
  • name (value, word, whatever makes sense)
  • [whatever else]

list_items

  • list_ID
  • item_ID
  • [an ordinal int, if "G-H-B-A" and "A-B-G-H" are considered different sequences]

(composite PK list_ID, item_ID [, ordinal] on that one, basic many:many relation)

Some data, so it's more clear what the tables represent:

INSERT INTO items (ID, name) VALUES (1, 'A'), (2, 'B'), (3, 'G'), (4, 'H');
INSERT INTO lists (ID, sequence) VALUES (1, 'A-B-G-H');
INSERT INTO list_items (list_ID, item_ID) VALUES (1, 1), (1, 2), (1, 3), (1, 4);
INSERT INTO lists (ID, sequence) VALUES (2, 'B-A-G');
INSERT INTO list_items (list_ID, item_ID) VALUES (2, 2), (2, 1), (2, 3);

And finally, to find lists that contain all items (A, B, G, H):

SELECT lists.sequence FROM lists
JOIN list_items ON lists.ID = list_items.list_ID
JOIN items AS i1 ON list_items.item_ID = i1.ID HAVING i1.name = 'A'
JOIN items AS i2 ON list_items.item_ID = i2.ID HAVING i2.name = 'B'
JOIN items AS i3 ON list_items.item_ID = i3.ID HAVING i3.name = 'G'
JOIN items AS i4 ON list_items.item_ID = i4.ID HAVING i4.name = 'H'

That should return any lists like "A-B-G-H", "G-H-A-B", "H-A-T-B-A-G", etc, but not "B-U-G-H-U-T" (no A) or "B-A-T-H" (no G) - all conditions have to be satisfied. Doing an "any" search might be a little more involved (writing this in my head over lunch, but RIGHT JOIN alone would probably result in all kinds of duplicates & slowness).

It won't map any genomes or redefine human language, but should be okay for a decent-sized data set. Either way, I'd avoid storing each list as a varchar and doing "WHERE sequence LIKE '%A%' AND sequence LIKE '%B%'" stuff unless you absolutely can't handle the extra work to add new data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文