为什么我不能简单地添加一个包含所有列的索引?
我在 SQL Server 数据库中有一个表,我希望能够尽快搜索和检索数据。我不关心插入表需要多长时间,我只关心获取数据的速度。
问题是使用 20 个或更多不同类型的查询来访问该表。这使得为每个查询添加专门设计的索引成为一项繁琐的任务。我正在考虑简单地添加一个包含表中所有列的索引。这不是您在“良好”数据库设计中通常会做的事情,因此我假设有一些充分的理由为什么我不应该这样做。
谁能告诉我为什么我不应该这样做?
更新:我忘了提及,我也不关心数据库的大小。没关系,这意味着我的数据库大小将增长到超出所需的大小
I have a table in SQL Server database which I want to be able to search and retrieve data from as fast as possible. I don't care about how long time it takes to insert into the table, I am only interested in the speed at which I can get data.
The problem is the table is accessed with 20 or more different types of queries. This makes it a tedious task to add an index specially designed for each query. I'm considering instead simply adding an index that includes ALL columns of the table. It's not something you would normally do in "good" database design, so I'm assuming there is some good reason why I shouldn't do it.
Can anyone tell me why I shouldn't do this?
UPDATE: I forgot to mention, I also don't care about the size of my database. It's OK that it means my database size will grow larger than it needed to
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
首先,SQL Server 中的索引的索引项最多只能有 900 个字节。仅此一点就不可能为所有列建立索引。
最重要的是:这样的索引根本没有任何意义。你想达到什么目的?
考虑一下:如果您在
(LastName, FirstName, Street, City)
上有一个索引,则该索引将无法用于加速上的查询FirstName
单独City
Street
非常有用
(LastName)
或(LastName, FirstName)
、或(LastName、FirstName、Street)
或(LastName、FirstName、Street、City)
但实际上没有其他内容 - 如果您只是搜索,则肯定不会
街道
或只是城市
!索引中列的顺序有很大差异,查询优化器不能只使用索引中间某处的任何列进行查找。
考虑一下您的电话簿:它可能按姓氏、名字甚至街道排序。那么该索引是否可以帮助您找到您所在城市的所有“Joe's”?所有居住在“大街”上的人?不 - 您可以先按姓氏查找 - 然后您可以在该数据集中获得更具体的信息。仅对所有内容都建立索引并不能帮助加快所有列的搜索速度。根本。
如果您希望能够按
Street
进行搜索 - 您需要在(Street)
上添加单独的索引(可能还需要添加一两列有意义的列)。如果您希望能够按职业或其他任何方式进行搜索 - 您需要另一个特定的索引。
仅仅因为您的列存在于索引中并不意味着会加快对该列的所有搜索速度!
主要规则是:使用尽可能少的索引 - 对于系统来说,太多的索引可能比没有索引更糟糕......构建您的系统,监视其性能,并找到那些成本最高的查询 - 然后优化这些,例如通过添加索引。
不要仅仅因为可以就盲目地为每一列建立索引 - 这是糟糕的系统性能的保证 - 任何索引也需要维护和保养,因此你拥有的索引越多,你的 INSERT、UPDATE 和 DELETE 操作就会受到更多影响(较慢),因为所有这些索引都需要更新。
First of all, an index in SQL Server can only have at most 900 bytes in its index entry. That alone makes it impossible to have an index with all columns.
Most of all: such an index makes no sense at all. What are you trying to achieve??
Consider this: if you have an index on
(LastName, FirstName, Street, City)
, that index will not be able to be used to speed up queries onFirstName
aloneCity
Street
That index would be useful for searches on
(LastName)
, or(LastName, FirstName)
, or(LastName, FirstName, Street)
, or(LastName, FirstName, Street, City)
but really nothing else - certainly not if you search for just
Street
or justCity
!The order of the columns in your index makes quite a difference, and the query optimizer can't just use any column somewhere in the middle of an index for lookups.
Consider your phone book: it's order probably by LastName, FirstName, maybe Street. So does that indexing help you find all "Joe's" in your city? All people living on "Main Street" ?? No - you can lookup by LastName first - then you get more specific inside that set of data. Just having an index over everything doesn't help speed up searching for all columns at all.
If you want to be able to search by
Street
- you need to add a separate index on(Street)
(and possibly another column or two that make sense).If you want to be able to search by
Occupation
or whatever else - you need another specific index for that.Just because your column exists in an index doesn't mean that'll speed up all searches for that column!
The main rule is: use as few indices as possible - too many indices can be even worse for a system than having no indices at all.... build your system, monitor its performance, and find those queries that cost the most - then optimize these, e.g. by adding indices.
Don't just blindly index every column just because you can - this is a guarantee for lousy system performance - any index also requires maintenance and upkeep, so the more indices you have, the more your INSERT, UPDATE and DELETE operations will suffer (get slower) since all those indices need to be updated.
您对索引的工作原理存在根本性误解。
阅读此解释“多列索引工作”。
您可能遇到的下一个问题是为什么不创建 每列一个索引——但如果您试图达到最佳选择性能,这也是一个死胡同。
您可能会觉得这是一项乏味任务,但我想说这是一项必需仔细索引的任务。草率的索引会遭到反击,如此示例所示。
注意:我坚信正确的索引是有回报的,而且我知道很多人都和你有同样的问题。这就是为什么我要写一本关于它的免费书。上面的链接引用的页面可能会帮助您回答您的问题。但是,您可能还想从开头开始阅读。
You are having a fundamental misunderstanding how indexes work.
Read this explanation "how multi-column indexes work".
The next question you might have is why not creating one index per column--but that's also a dead-end if you try to reach top select performance.
You might feel that it is a tedious task, but I would say it's a required task to index carefully. Sloppy indexing strikes back, as in this example.
Note: I am strongly convinced that proper indexing pays off and I know that many people are having the very same questions you have. That's why I'm writing a the a free book about it. The links above refer the pages that might help you to answer your question. However, you might also want to read it from the beginning.
我认为提问者在问
“为什么我不能制作这样的索引”:
相关问题已得到解决。
但考虑到他们正在使用 MS sql 服务器。
了解您可以在索引中包含非键列,以便这些列的值可用于从索引中检索,但不能用作选择标准,这一点很有用:
我创建了两个表,其中包含一百万行相同的行,
我对表 A 建立了索引像这样
和表 B 像这样
毫不奇怪,表 A 的插入速度稍快一些。
但是当我在表A上运行这些查询时
,sql server甚至没有使用索引,它进行了表扫描,并抱怨缺少索引,包括id,a,b,c,d
在表B上,查询速度快了 50 多倍,而且 io 少得多,
强制 A 上的查询使用索引并没有让它变得更快
I think the questioner is asking
'why can't I make an index like':
The problems with that have been addressed.
But given it sounds like they are using MS sql server.
It's useful to understand that you can include nonkey columns in an index so they the values of those columns are available for retrieval from the index, but not to be used as selection criteria :
I created two tables with a million identical rows
I indexed table A like this
and table B like this
no surprise, table A was slightly faster to insert to.
but when I and ran these this queries
On table A, sql server didn't even use the index, it did a table scan, and complained about a missing index including id,a,b,c,d
On table B, the query was over 50 times faster with much less io
forcing the query on A to use the index didn't make it any faster
...如果您添加一个包含所有列的索引,并且查询实际上能够使用该索引,它将按照主键的顺序扫描它。这意味着几乎打破了每一项记录。平均搜索时间为 O(n/2)...与访问实际数据库相同。
您需要阅读一些关于索引的
位内容。如果您认为表上的索引有点像 C# 中的字典,这可能会有所帮助。
这意味着名称列已建立索引,并将返回主键列表。
这意味着姓名列+职业列被索引。现在想象索引包含 10 个不同的列,嵌套得很深,它包含表中的每一行。
请注意,这并不是它的工作原理。但它应该让您了解索引在 C# 中实现时如何工作。你需要做的是根据一两个被广泛查询的键创建索引,这样索引比扫描整个表更有用。
...if you add an index that contains all columns, and a query was actually able to use that index, it would scan it in the order of the primary key. Which means hitting nearly every record. Average search time would be O(n/2).. the same as hitting the actual database.
You need to read a
bitlot about indexes.It might help if you consider an index on a table to be a bit like a Dictionary in C#.
That means that the name column is indexed, and will return a list of primary keys.
That means that the name column + occupation columns are indexed. Now imagine the index contained 10 different columns, nested so far deep it contains every single row in your table.
This isn't exactly how it works mind you. But it should give you an idea of how indexes could work if implemented in C#. What you need to do is create indexes based on one or two keys that are queried on extensively, so that the index is more useful than scanning the entire table.
如果这是一个数据仓库类型的操作,其中查询针对 READ 查询进行了高度优化,并且如果您有 20 种剖析数据的方法,例如
涉及 WHERE 子句..
并且您绝对有足够的快速存储空间可以燃烧,那么 无论如何为每个列单独创建一个索引。因此,一个 20 列的表将有 20 个索引,每个单独的列一个索引。我可能会说忽略位列或低基数列,但既然我们已经走了这么远,为什么还要麻烦(带着这个警告)。他们只会坐在那里,搅动写入时间,但如果你不关心图片的那部分,那么我们都很好。
分析您的 20 个查询,如果您有热门查询(最热门的查询)仍然无法更快地运行,请使用 SSMS(按 Ctrl-L)和查询窗口中的一个查询来规划它。它会告诉您什么索引可以帮助查询 - 只需创建它即可;创建它们,充分记住这会再次增加写入成本、备份文件大小、数据库维护时间等。
If this is a data warehouse type operation where queries are highly optimized for READ queries, and if you have 20 ways of dissecting the data, e.g.
WHERE clause involves..
And you absolutely have plenty of fast storage space to burn, then by all means create an index for EVERY single column, separately. So a 20-column table will have 20 indexes, one for each individual column. I could probably say to ignore bit columns or low cardinality columns, but since we're going so far, why bother (with that admonition). They will just sit there and churn the WRITE time, but if you don't care about that part of the picture, then we're all good.
Analyze your 20 queries, and if you have hot queries (the hottest ones) that still won't go any faster, plan it using SSMS (press Ctrl-L) with one query in the query window. It will tell you what index can help that queries - just create it; create them all, fully remembering that this adds again to the write cost, backup file size, db maintenance time etc.
这总是一个坏主意。数据库中的索引并不是某种神奇的精灵灰尘。您必须分析您的查询,并根据查询内容和方式进行分析 - 附加索引。
这并不像“将所有内容添加到索引并小睡一会儿”那么简单
This is always a bad idea. Indexes in database is not some sort of pixie dust that works magically. You have to analyze your queries and according to what and how is being queried - append indexes.
It is not as simple as "add everything to index and have a nap"
我在这里只看到又长又复杂的答案,所以我想我应该给出尽可能简单的答案。
您无法将整个表或其所有列添加到索引,因为这只会重复该表。
简单来说,索引只是另一个表,其中包含按您通常期望查询的顺序排列的选定数据,以及指向磁盘上其余数据所在行的指针。
因此,存在一定程度的间接性。您以预先排序的方式拥有表的部分副本(在磁盘和 RAM 中,假设索引没有碎片),这样仅查询索引中定义的列会更快,而其余列可以无需扫描磁盘即可获取它们,因为索引包含对磁盘上正确位置的引用,其中每一行的其余数据都在该位置。
I see only long and complicated answers here so I thought I should give the simplest answer possible.
You cannot add an entire table, or all its columns, to an index because that just duplicates the table.
In simple terms, an index is just another table with selected data ordered in the order you normally expect to query it in, and a pointer to the row on disk where the rest of the data lives.
So, a level of indirection exists. You have a partial copy of a table in an preordered manner (both on disk and in RAM, assuming the index is not fragmented), which is faster to query for the columns defined in the index only, while the rest of the columns can be fetched without having to scan the disk for them, because the index contains a reference to the correct position on disk where the rest of the data is for each row.
1)大小,索引本质上是在该列中构建数据的副本,一些易于搜索的结构,例如二叉树(我不知道SQL Server规范)。
2)您提到了速度,添加索引结构的速度较慢。
1) size, an index essentially builds a copy of the data in that column some easily searchable structure, like a binary tree (I don't know SQL Server specifcs).
2) You mentioned speed, index structures are slower to add to.
该索引将与您的表相同(可能按其他顺序排序)。
它不会加快您的查询速度。
That index would just be identical to your table (possibly sorted in another order).
It won't speed up your queries.