如何构建表以快速搜索大量列

发布于 2024-12-05 17:23:11 字数 1239 浏览 2 评论 0原文

我有一个包含大量列 (~60) 的表,最终将包含大量行 (~10 000),并且我需要能够同时有效地搜索多个列值。我不确定搜索是否会完全匹配(LIKE 'value',而不是 LIKE '%value%'),尽管 LIKE 'value %' 可能是一个可以接受的折衷方案。

已经提出了一些解决方案。我对数据库设计原则不是很熟悉,所以对我来说哪个是最好的并不明显:

  1. 对每列单独建立索引。用户将能够搜索任意列组合,因此不再需要更复杂的索引。数据库上的读取次数将多于写入次数,因此写入速度下降应该不是问题。

  2. 制作另一个表格用于搜索,如下所示:

    obj_id col_num col_name col_value
    ------------------------------------------------    
    1 1 '名字' '乔'    
    1 2 '工作' '工程师'    
    2 1 '姓名' '账单'
    

    等等。我认为 col_num 和 col_name 列是多余的,但是 大概一个比另一个更好。我不知道这是什么 称为,虽然听起来像 Entity-Attribute-Value 模型(另请参阅此问题)。据我所知, 与 EAV 模型的主要区别在于该表不会 疏;所有实体都将具有大部分或全部属性。

  3. 在第一个表上为反向索引创建另一个表。理论上我知道如何做到这一点,但这将是一项巨大的工作量。此外,我们可能会丢失有关每个数据来自哪一列的信息,这不太好。另外,这感觉对于解决方案 1 来说是多余的,但我实际上不知道如何创建表索引。

这些是我们迄今为止提出的解决方案。如果相关的话,我们正在使用 Oracle 数据库,这并不是真正可选的,但我有权以任何必要的方式重构数据库。那么,这里最好的解决方案是什么?当然,“以上都不是”是一个完全可以接受的答案。这些表实际上还不存在,所以没有什么可以擦除和重新制作的。

谢谢!

I have a table with a large number of columns (~60), which will eventually have a large number of rows (~10 000), and I'm going to need to be able to search efficiently on several column values at once. I'm not sure whether the searches will be exact-match (LIKE 'value', and not LIKE '%value%'), although LIKE 'value%' might be an acceptable compromise.

A few solutions have been proposed. I'm not very familiar with database design principles, so it's not obvious to me which is the best:

  1. Index on every column individually. The users will be able to search on any combination of columns, so no more complicated indexes will work. There will be a lot more reads than writes on the database, so the write-speed slowdown shouldn't be a problem.

  2. Make another table just for searching that looks like this:

    obj_id  col_num  col_name  col_value
    -------------------------------------    
    1       1        'name'    'joe'    
    1       2        'job'     'engineer'    
    2       1        'name'    'bill'
    

    etc. I think the col_num and col_name columns are redundant, but
    presumably one is better than another. I have no idea what this is
    called, although it sounds like the Entity-Attribute-Value
    model
    (see also this question). From what I can tell, the
    main difference from an EAV model is that this table would not be
    sparse; all entities will have most or all attributes.

  3. Make another table for an inverted index on the first table. I know how to do this in theory, but it would be a huge amount of work. Also, we'd probably lose information about which column each datum is from, which is not great. Also also, this feels like it would be redundant with solution 1, but I don't actually know how table indexes are created.

Those are the solutions that we have come up with so far. If it's relevant, we're using an Oracle db, which is not really optional, but I have the permissions to refactor the database in any way necessary. So, what is the best solution here? "None of the above" is a totally acceptable answer, of course. None of these tables actually exist yet, so there's nothing to wipe out and remake.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

是伱的 2024-12-12 17:23:11

您提到的示例确实更适合全文搜索(正如 Bill Karwin 所建议的那样)。如果没有看到(草稿)表定义,就很难看出情况是否确实如此。

好消息是,对于经过良好调优的 Oracle 服务器来说,10K 条记录是微不足道的。如果这是您的表将增长的最大数量,我会避免任何有利于可维护性的奇异解决方案。

EAV 基本上使布尔运算符成为后端的巨大痛苦,并使支持特定数据类型(文本、日期、数字等)成为同样巨大的痛苦。

我将使用您对索引方案的最佳猜测构建表的示例,用代表性的虚拟数据填充它,并按照您期望需要的查询运行查询。衡量绩效,看看是否有问题;优化您的索引和查询,并且仅在确实需要时才进行重构。

The examples you mention are indeed a better match for full text searching (as Bill Karwin suggests). Without seeing a (draft) table definition, it's hard to see if that's actually the case.

The good news is that 10K records is a trivial amount for a well-tuned Oracle server. If that's the largest your table will grow, I would avoid any exotic solutions in favour of maintainability.

EAV basically makes boolean operators into a huge pain in the backside, and makes supporting specific datatypes (text, dates, numbers etc.) into an equally big pain.

I'd build a sample of your table with your best guess at the indexing scheme, populate it with representative dummy data, and run queries along the lines of the ones you expect to need. Measure performance, and see if you have a problem; optimize your indices and queries, and only go to a refactoring if you really need to.

呢古 2024-12-12 17:23:11

使用 Oracle 的全文搜索功能怎么样?您的需求似乎符合 CTXCAT 的目的。

有关概述,请参阅使用 Oracle Text 建立索引 Oracle 中不同的全文索引选项。

How about using Oracle's fulltext search features? Your needs seem to fit the purpose for CTXCAT.

See Indexing with Oracle Text for an overview of the different fulltext indexing options in Oracle.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文