表索引设计

发布于 2024-10-08 22:51:32 字数 712 浏览 0 评论 0原文

我想向我的表添加索引。 我正在寻找如何向表添加更多索引的一般想法。 除了PK聚集。 我想知道当我这样做时要寻找什么。 所以,我的例子:

这个表(我们称之为任务表)将成为整个应用程序中最大的表。期待数百万条记录。

重要:大量批量插入正在此表中添加数据

表有 27 列:(到目前为止,并且正在计数:D)

int x 9 列 = id-s

varchar x 10 列

bit x 2 列

datetime x 5 列

INT COLUMNS

所有这些都是 INT ID-s,但来自通常小于任务表的表(最多 10-50 条记录),例如:状态表(具有“打开”等值, “已关闭”)或优先级表(具有“重要”、“不太重要”、“正常”等值) 还有一个像“parent-ID”(自ID)

连接的列:所有“小”表都有PK,通常的方式...集群

STRING COLUMNS

有一个(公司)列(字符串!)类似于“始终为 5 个字符长”,并且每个用户都将受到使用此字符串的限制。如果任务中有 15 个不同的“公司”,则登录用户只会看到一个。所以这上面总是有一个过滤器。向该列添加索引可能是个好主意?

日期列

我认为他们没有索引这些......对吧?或者可以/应该吗?

I would like to add index(s) to my table.
I am looking for general ideas how to add more indexes to a table.
Other than the PK clustered.
I would like to know what to look for when I am doing this.
So, my example:

This table (let's call it TASK table) is going to be the biggest table of the whole application. Expecting millions records.

IMPORTANT: massive bulk-insert is adding data in this table

table has 27 columns: (so far, and counting :D )

int x 9 columns = id-s

varchar x 10 columns

bit x 2 columns

datetime x 5 columns

INT COLUMNS

all of these are INT ID-s but from tables that are usually smaller than Task table (10-50 records max), example: Status table (with values like "open", "closed") or Priority table (with values like "important", "not so important", "normal")
there is also a column like "parent-ID" (self - ID)

join: all the "small" tables have PK, the usual way ... clustered

STRING COLUMNS

there is a (Company) column (string!) that is something like "5 characters long all the time" and every user will be restricted using this one. If in Task there are 15 different "Companies" the logged in user would only see one. So there's always a filter on this one. Might be a good idea to add an index to this column?

DATE COLUMNS

I think they don't index these ... right? Or can / should be?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

请你别敷衍 2024-10-15 22:51:32

我不会添加任何索引 - 除非您有特定原因这样做,例如性能问题。

为了确定要添加哪种类型的索引,您需要知道:

  • 正在对您的表使用哪种类型的查询 - 什么是 WHERE 子句、哪种 ORDER BY 您在做什么?

  • 您的数据是如何分布的?哪些列的选择性足够(<2% 的数据)可用于索引

  • 附加索引对表上的插入和更新有什么样的(负面)影响

  • 任何外键列都应该是索引的一部分 - 最好作为索引的第一列 - 以加快与其他表的联接

并且确定您可以索引DATETIME列 - 是什么你认为你不能?如果您有很多查询将通过日期范围来限制其结果集,那么对 DATETIME 列进行索引是完全有意义的 - 也许不是单独索引,而是在复合索引中与表格的其他元素。

您无法索引的是保存超过 900 字节数据的列 - 诸如 VARCHAR(1000) 之类的内容。

有关索引的深入且知识渊博的背景信息,请参阅 Kimberly Tripp 的博客< /a>,索引女王。

I wouldn't add any indices - unless you have specific reasons to do so, e.g. performance issues.

In order to figure out what kind of indices to add, you need to know:

  • what kind of queries are being used against your table - what are the WHERE clauses, what kind of ORDER BY are you doing?

  • how is your data distributed? Which columns are selective enough (< 2% of the data) to be useful for indexing

  • what kind of (negative) impact do additional indices have on your INSERTs and UPDATEs on the table

  • any foreign key columns should be part of an index - preferably as the first column of the index - to speed up JOINs to other tables

And sure you can index a DATETIME column - what made you think you cannot?? If you have a lot of queries that will restrict their result set by means of a date range, it can make total sense to index a DATETIME column - maybe not by itself, but in a compound index together with other elements of your table.

What you cannot index are columns that hold more than 900 bytes of data - anything like VARCHAR(1000) or such.

For great in-depth and very knowledgeable background on indexing, consult the blog by Kimberly Tripp, Queen of Indexing.

小镇女孩 2024-10-15 22:51:32

一般来说,索引将加速 JOIN、排序操作和过滤器

,所以如果列位于 JOIN、ORDER BY 或 WHERE 子句中,那么索引将在性能方面有所帮助……但总有一个但是...您添加的每个索引 UPDATE、DELETE 和 INSERT 操作都会减慢,因为必须维护索引,

所以答案是...这取决于

我会说开始使用查询命中表并查看执行计划对于扫描,尝试通过编写 SARGable 查询或根据需要添加索引来进行这些搜索...不要只是为了添加索引而添加索引

in general an index will speed up a JOIN, a sort operation and a filter

SO if the columns are in the JOIN, the ORDER BY or the WHERE clause then an index will help in terms of performance...but there is always a but...with every index that you add UPDATE, DELETE and INSERT operations will be slowed down because the indexes have to be maintained

so the answer is...it depends

I would say start hitting the table with queries and look at the execution plans for scans, try to make those seeks by either writing SARGable queries or adding indexes if needed...don't just add indexes for the sake of adding indexes

谈下烟灰 2024-10-15 22:51:32

第一步是了解表中的数据将如何使用:如何插入、选择、更新、删除。在不了解您的使用模式的情况下,您就是在黑暗中拍摄。 (另请注意,无论您现在想到什么,都可能是错误的。一旦启动并运行,请务必将您的决定与实际使用模式进行比较。)一些想法:

如果用户经常查找表中的单个项目,主键上的索引至关重要。

如果数据插入频率很高并且您有多个索引,那么随着时间的推移,您将不得不处理索引碎片。阅读并理解聚集和非聚集索引以及碎片(ALTER INDEX...REBUILD)。

但是,如果在需要检索大量行的情况下性能是关键,您可能会考虑使用聚集索引来支持这一点。

如果您经常需要一组基于状态的数据,那么在该列上建立索引可能会很好,特别是如果 1% 的行是“活动”而 99% 是“不活动”,而您想要的只是活动行。

相反,如果您的“PriorityId”仅用于获取说明 PriorityId 42 是什么的“标签”(即加入查找表),则您可能不需要在主表中为其建立索引。

最后一个想法,如果每个人一次总是只检索一家公司的数据,那么(a)您肯定希望对此建立索引,并且(b)您可能需要考虑根据该值对表进行分区,因为它可以充当传统索引之上的“内置过滤器”。 (这可能有点极端,并且仅在企业版中可用,但在您的情况下可能是值得的。)

Step one is to understand how the data in the table will be used: how will it be inserted, selected, updated, deleted. Without knowing your usage patterns, you're shooting in the dark. (Note also that whatever you come up with now, you may be wrong. Be sure to compare your decisions with actual usage patterns once you're up and running.) Some ideas:

If users will often be looking up individual items in the table, an index on the primary key is critical.

If data will be inserted with great frequency and you have multiple indexes, over time you well have to deal with index fragmentation. Read up on and understand clustered and non-clustered indexes and fragmentation (ALTER INDEX...REBUILD).

But, if performance is key in situations when you need to retrieve a lot of rows, you might consider using your clustered indexe to support that.

If you often want a set of data based on Status, indexing on that column can be good--particularly if 1% of your rows are "Active" vs. 99% "Not Active", and all you want are the active ones.

Conversely, if your "PriorityId" is only used to get the "label" stating what PriorityId 42 is (i.e. join into the lookup table), you probably don't need an index on it in your main table.

A last idea, if everyone will always retrieve data for only one Company at a time, then (a) you'll definitely want to index on that, and (b) you might want to consider partitioning the table on that value, as it can act as a "built in filter" above and beyond conventional indexing. (This is perhaps a bit extreme and it's only available in Enterprise edition, but it may be worth it in your case.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文