实现“按顺序搜索”的数据结构
我想知道我应该使用什么数据结构/存储策略来解决这个问题。
数据库中的每个数据条目都由多个有序项的列表组成,例如ABCD,其中A、B、C、D是不同的项。
假设我在数据库中有 3 个条目,
ABCD
EFG
GHBA
当用户输入一些无序条目时,我必须从数据库中找到匹配的有序条目。例如,如果用户输入A,B,G,H,我想从数据库返回GHBA给用户。
我的数据存储策略应该是什么?
I would like to know what data structure / storage strategy I should use for this problem.
Each data entry in the database consists of a list of multiple ordered items, such as A-B-C-D, where A, B, C, D are different items.
Suppose I have 3 entries in a database,
A-B-C-D
E-F-G
G-H-B-A
When the user entered some unordered items, I have to find the matching ordered entry(ies) from the database. For example, if user enters A,B,G,H, I want to return G-H-B-A from the database to the user.
What should be my data storage strategy?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您最好分别存储有序元素和无序元素,否则您将需要搜索有序元素的所有排列,这将非常耗时。
试试这个:
上面将返回 3:GHBA,就像你想要的那样。如果您通过了 DCBA,您将返回 1:ABCD,再次像您正在寻找的那样。如果您传入 C,您将不会返回任何内容,因为没有组仅包含 C。
您可能需要使用 表值参数 用于您的输入,如上所示,但您可以将最终的 SELECT 转换为简单列表并删除 ItemList 类型。
You're best off storing the ordered and unordered elements separately, otherwise you'll need to search on all permutations of the ordered elements, which would be time consuming.
Try this:
The above will return 3:GHBA, like you want. If you pass in DCBA you'll get back 1:ABCD, again like you're looking for. If you pass in C, you'll get back nothing, as no group consists of just C.
You will probably want to use a table-valued parameter for your input, as shown above, but you could convert the final SELECT to a simple list and drop the ItemList type.
将列表拆分为单独的项目并在该级别上工作。
一些表:
列出
项目
list_items
(复合PK list_ID,item_ID [,序数],基本的多:多关系)
一些数据,所以更清楚表代表什么:
最后,找到列表包含所有项(A、B、G、H):
应该返回“ABGH”、“GHAB”、“HATBAG”等任何列表,但不返回“BUGHUT”(无 A)或“BATH”(无 G)- 必须满足所有条件。进行“任何”搜索可能会涉及更多一些(在午餐时在我的脑海中写下这个,但单独使用 RIGHT JOIN 可能会导致各种重复和缓慢)。
它不会绘制任何基因组图谱或重新定义人类语言,但对于规模相当大的数据集来说应该没问题。无论哪种方式,我都会避免将每个列表存储为 varchar 并执行“
WHERE sequence LIKE '%A%' AND sequence LIKE '%B%'
”的操作,除非您绝对无法处理额外的内容努力添加新数据。Split the lists into individual items and work on that level.
Some tables:
lists
items
list_items
(composite PK list_ID, item_ID [, ordinal] on that one, basic many:many relation)
Some data, so it's more clear what the tables represent:
And finally, to find lists that contain all items (A, B, G, H):
That should return any lists like "A-B-G-H", "G-H-A-B", "H-A-T-B-A-G", etc, but not "B-U-G-H-U-T" (no A) or "B-A-T-H" (no G) - all conditions have to be satisfied. Doing an "any" search might be a little more involved (writing this in my head over lunch, but
RIGHT JOIN
alone would probably result in all kinds of duplicates & slowness).It won't map any genomes or redefine human language, but should be okay for a decent-sized data set. Either way, I'd avoid storing each list as a varchar and doing "
WHERE sequence LIKE '%A%' AND sequence LIKE '%B%'
" stuff unless you absolutely can't handle the extra work to add new data.