组合索引 vs. 多个单索引 vs. 全文索引的查询性能
背景:我有一个包含 500 万个地址条目的表,我想搜索不同的字段(客户姓名、联系人姓名、邮政编码、城市、电话等),最多 8 个字段。 数据相当稳定,每天最多更改 50 次,所以几乎只有读访问。
用户不应该提前告诉我他正在搜索什么,而且我还希望支持组合搜索(搜索词的 AND 串联)。 例如,“lincoln+lond”应搜索在任何搜索字段中包含两个搜索词的所有记录,以及以任何词开头的条目(如本例中的“London”)。
问题:现在我需要为此搜索表选择索引策略。 (附带说明:我正在尝试实现亚秒级响应时间,最差响应时间应该是 2 秒。)在性能方面更好的是:
- 对所有可查询列进行组合索引(需要其中 2 个列,当达到 900 字节的索引限制)
- 在每个可查询列上放置单个索引
- 在可查询列上创建全文索引并使用全文查询
我放弃第 1 点,因为它似乎没有任何优势(索引使用将是有限的,并且不会有“索引查找”,因为并非所有字段都适合一个索引)。
问题:现在,我应该使用多个单一索引变体还是应该使用全文索引? 是否有任何其他方式来实现上述功能?
Background: I have a table with 5 million address entries which I'd like to search for different fields (customer name, contact name, zip, city, phone, ...), up to 8 fields. The data is pretty stable, maximum 50 changes a day, so almost only read access.
The user isn't supposed to tell me in advance what he's searching for, and I also want support of combined search (AND-concatenation of search terms). For example "lincoln+lond" should search for all records containing both search terms in any of the search fields, also those entries starting with any of the terms (like "London" in this example).
Problem: Now I need to choose an indexing strategy for this search table. (As a side note: I'm trying to achieve sub-second response time, worst response time should be 2 seconds.) What's better in terms of perfomance:
- Do a combined index out of all queryable columns (would need 2 of them, as index limit of 900 bytes reached)
- Put single indexes on each of the queryable columns
- Make a fulltext index on the queryable columns and use fulltext query
I'm discarding point 1, as it doesn't seem to have any advantage (index usage will be limited and there will be no "index seek", because not all fields fit in one single index).
Question: Now, should I use the multiple single indexes variant or should I go with the fulltext index? Is there any other way to achieve the functionality mentioned above?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
尝试两者,看看哪个在您的系统上更快。 数据库优化没有什么硬性规定,这实际上取决于您的环境。
Try them both and see which is faster on your system. There are few hard and fast rules for database optimizations, it really depends on your environment.
最初,我打算建议使用FTS,因为它有很多强大的性能特征。 特别是当您处理各种查询时。 (例如,x AND y。x NEAR y,等等)。
但在我开始闲聊 FTS 的专业功能之前,我刚刚检查了您的服务器版本 -> SQL2000。
可怜的东西。 当时 FTS 非常简单,所以坚持使用多个单一索引。
我们使用 Sql2008 并且......它很棒。
哦,顺便说一句。 你知道Sql2008(免费版)中有FTS吗? 可以升级吗?
从sql2000开始-> sql2008非常值得,如果可以的话。
但是,是的,坚持使用 MSI 选项。
Originally, i was about to suggest going with FTS as that has a lot of strong performance features going for it. Especially when you dealing with varied queries. (eg. x AND y. x NEAR y, etc..).
But before I start to ramble on with the pro's of FTS, I just checked your server version -> sql2000.
poor thing. FTS was very simple back then, so stick with multiple single indexes.
We use Sql2008 and ... it rocks.
Oh, btw. did you know that Sql2008 (free edition) has FTS in it? Is it possible to upgrade?
Going from sql2000 -> sql2008 is very worth it, if you can.
But yeah, stick with your M.S.I. option.
我同意 Grauenwolf 的观点,并且我想添加有关索引的注释。 请记住,如果您使用如下语法:
那么在field1上搜索时无论如何都不会使用索引,并且您必须求助于全文索引。 为了完整起见,上述语法返回 field1 包含 value 的所有行(不一定在开头)。
如果您必须搜索“包含”,那么全文索引可能更合适。
I agree with Grauenwolf, and I'd like to add a note about indexes. Keep in mind that if you use a syntax like the following:
Then no index will be used anyway when searching on field1 and you have to resort to a full-text index. For the sake of completeness, the above syntax returns all rows where field1 contains value (not necessarily at the beginning).
If you have to search for "contains", a full-text index is probably more appropriate.
回答我自己的问题:
我选择了“多个单一索引”选项。 我最终为每个查询列都有一个索引,每个索引仅包含列本身。 搜索效果非常好,响应时间大多为亚秒级。 有时需要 2-3 秒,但我将其归因于我的数据库服务器(几年前的笔记本电脑,具有 3GB RAM 和慢速磁盘)。
我没有测试全文选项,因为它不再是必要的(而且我没有时间这样做。)
To answer my own question:
I've chosen the "multiple single indexes" option. I ended having an index for each of the queried columns, each index containing only the column itself. The search works very good with mostly subsecond response times. Sometimes it takes up to 2-3 seconds, but I'm attributing it to my database server (several years old laptop with 3GB Ram and slow disk).
I didn't test the fulltext option as it was not anymore necessary (and I don't have the time to do it.)