是否有理由担心表中的列顺序?
我知道你可以用 FIRST 和 AFTER 来改变 MySQL 中的列顺序,但是你为什么要费心呢?由于好的查询在插入数据时会显式命名列,因此真的有理由关心列在表中的顺序吗?
I know you can ALTER the column order in MySQL with FIRST and AFTER, but why would you want to bother? Since good queries explicitly name columns when inserting data, is there really any reason to care what order your columns are in in the table?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
列顺序对我调优的一些数据库(包括 Sql Server、Oracle 和 MySQL)有很大的性能影响。这篇文章有良好的经验法则:
性能差异的一个示例是索引查找。数据库引擎根据索引中的某些条件查找行,并返回行地址。现在假设您正在寻找 SomeValue,它位于此表中:
引擎必须猜测 SomeValue 从哪里开始,因为 SomeString 的长度未知。但是,如果将顺序更改为:
现在引擎知道可以在行开始后 4 个字节处找到 SomeValue。因此列顺序会对性能产生相当大的影响。
编辑:Sql Server 2005 在行的开头存储固定长度字段。每行都有一个对 varchar 开头的引用。这完全否定了我上面列出的效果。因此对于最近的数据库,列顺序不再有任何影响。
Column order had a big performance impact on some of the databases I've tuned, spanning Sql Server, Oracle, and MySQL. This post has good rules of thumb:
An example for difference in performance is an Index lookup. The database engine finds a row based on some conditions in the index, and gets back a row address. Now say you are looking for SomeValue, and it's in this table:
The engine has to guess where SomeValue starts, because SomeString has an unknown length. However, if you change the order to:
Now the engine knows that SomeValue can be found 4 bytes after the start of the row. So column order can have a considerable performance impact.
EDIT: Sql Server 2005 stores fixed-length fields at the start of the row. And each row has a reference to the start of a varchar. This completely negates the effect I've listed above. So for recent databases, column order no longer has any impact.
更新:
在
MySQL
中,这样做可能是有原因的。由于可变数据类型(例如 VARCHAR)在 InnoDB 中以可变长度存储,因此数据库引擎应遍历每一行中所有先前的列以找出给定列的偏移量。
对于
20
列,影响可能高达 17%。有关更多详细信息,请参阅我的博客中的此条目:
在
Oracle
中,尾随NULL
列不占用空间,这就是为什么您应该始终将它们放在表的末尾。同样在 Oracle 和 SQL Server 中,如果行很大,则可能会发生 ROW CHAINING。
ROW CHANING 是将不适合一个块的行拆分并将其跨越多个块,并通过链表连接。
读取不适合第一个块的尾随列将需要遍历链表,这将导致额外的 I/O 操作。
请参阅此页面了解
ROW CHAINING<的说明Oracle 中的 /code>:
这就是为什么您应该将经常使用的列放在表的开头,以及不经常使用的列或往往为 NULL 的列代码>,到表末尾。
重要提示:
如果您喜欢这个答案并想投票,请也投票给
@Andomar
的回答。他回答了同样的问题,但似乎无缘无故地被否决了。
Update:
In
MySQL
, there may be a reason to do this.Since variable datatypes (like
VARCHAR
) are stored with variable lengths inInnoDB
, the database engine should traverse all previous columns in each row to find out the offset of the given one.The impact may be as big as 17% for
20
columns.See this entry in my blog for more detail:
In
Oracle
, trailingNULL
columns consume no space, that's why you should always put them to the end of the table.Also in
Oracle
and inSQL Server
, in case of a large row, aROW CHAINING
may occur.ROW CHANING
is splitting a row that doesn't fit into one block and spanning it over the multiple blocks, connected with a linked list.Reading trailing columns that didn't fit into the first block will require traversing the linked list, which will result in an extra
I/O
operation.See this page for illustration of
ROW CHAINING
inOracle
:That's why you should put columns you often use to the beginning of the table, and columns you don't use often, or columns that tend to be
NULL
, to the end of the table.Important note:
If you like this answer and want to vote for it, please also vote for
@Andomar
's answer.He answered the same thing, but seems to be downvoted for no reason.
在上一份工作的 Oracle 培训期间,我们的 DBA 建议将所有不可空列放在可空列之前是有利的......尽管说实话,我不记得原因的详细信息。或者也许只是那些可能会更新的内容应该放在最后? (如果行扩展,也许可以推迟行的移动)
一般来说,它应该没有任何区别。正如您所说,查询应始终指定列本身,而不是依赖于“select *”的顺序。我不知道有任何数据库允许更改它们......好吧,直到你提到它之前我不知道 MySQL 允许它。
During Oracle training at a previous job, our DBA suggested that putting all the non-nullable columns before the nullable ones was advantageous... although TBH I don't remember the details of why. Or maybe it was just the ones that were likely to get updated should go at the end? (Maybe puts off having to move the row if it expands)
In general, it shouldn't make any difference. As you say, queries should always specify columns themselves rather than relying on the ordering from "select *". I don't know of any DB that allows them to be changed... well, I didn't know MySQL allowed it until you mentioned it.
键入以下内容时,输出的可读性
当您必须在数据库管理软件中
?这是一个非常虚假的理由,但目前我想不出其他的理由。
Readability of the output when you have to type:
in your database management software?
It's a very spurious reason, but at the moment I can't think of anything else.
一些写得不好的应用程序可能依赖于列顺序/索引而不是列名。他们不应该这样,但它确实发生了。更改列的顺序会破坏此类应用程序。
Some badly-written applications might be dependent on column order / index instead of column name. They shouldn't be, but it does happen. Changing the order of the columns would break such applications.
2002 年,Bill Thorsteinson 在 Hewlett Packard 论坛上发布了他通过重新排序列来优化 MySQL 查询的建议。此后,他的帖子在互联网上被复制粘贴了至少一百次,而且常常没有引用。准确地引用他的话...
资料来源:HP 论坛。
但该帖子已发布一切都回到2002年! 这个建议是针对 MySQL 3.23 版本的,比 MySQL 5.1 发布早了六年多。并且没有任何参考或引用。那么,比尔是对的吗?那么这个级别的存储引擎到底是如何工作的呢?
引用 Oracle 认证专家 Martin Zahn 在关于Oracle 行链接和迁移的秘密...
。是一本相当不错的读物!但我在这里只引用与我们手头的问题直接相关的部分。
18 年多过去了,我必须说:谢谢,比尔!
In 2002, Bill Thorsteinson posted on the Hewlett Packard forums his suggestions for optimizing MySQL queries by reordering the columns. His post has since been literally copied and pasted at least a hundred times on the Internet, often without citation. To quote him exactly...
Source: HP Forums.
But that post was made all the back in 2002! This advice was for MySQL version 3.23, more than six years before MySQL 5.1 would be released. And there are no references or citations. So, was Bill right? And how exactly does the storage engine work at this level?
To quote Martin Zahn, an Oracle-certified professional, in an article on The Secrets of Oracle Row Chaining and Migration...
The rest of the article is a rather good read! But I am only quoting the part here that is directly relevant to our question at hand.
More than 18 years later, I gotta say it: thanks, Bill!
不,SQL 数据库表中的列顺序完全无关 - 除了显示/打印目的。对列重新排序是没有意义的 - 大多数系统甚至不提供执行此操作的方法(除了删除旧表并使用新列顺序重新创建它)。
Marc
编辑:来自关系数据库的维基百科条目,这里的相关部分对我来说清楚地表明列顺序永远不应该受到关注:
关系被定义为一组 n 元组。在数学和关系数据库模型中,集合是无序项目的集合,尽管某些 DBMS 对其数据强加了顺序。在数学中,元组有顺序,并且允许重复。 EF Codd 最初使用此数学定义来定义元组。后来,EF Codd 的伟大见解之一是,在基于关系的计算机语言中,使用属性名称而不是排序会方便得多(一般来说)。这种见解至今仍在使用。
No, the order of the columns in a SQL database table is totally irrelevant - except for display / printing purposes. There's no point in reordering columns - most systems don't even provide a way to do that (except dropping the old table and recreating it with the new column order).
Marc
EDIT: from the Wikipedia entry on relational database, here's the relevant portion which to me clearly shows that column order should never be of concern:
A relation is defined as a set of n-tuples. In both mathematics and the relational database model, a set is an unordered collection of items, although some DBMSs impose an order to their data. In mathematics, a tuple has an order, and allows for duplication. E.F. Codd originally defined tuples using this mathematical definition. Later, it was one of E.F. Codd's great insights that using attribute names instead of an ordering would be so much more convenient (in general) in a computer language based on relations. This insight is still being used today.
除了明显的性能调整之外,我还遇到了一个极端情况,即对列重新排序导致(以前可用的)sql 脚本失败。
从文档中“除非显式指定,否则 TIMESTAMP 和 DATETIME 列没有自动属性,但例外:默认情况下,第一个 TIMESTAMP 列同时具有 DEFAULT CURRENT_TIMESTAMP 和 ON UPDATE CURRENT_TIMESTAMP(如果两者均未显式指定)” https://dev.mysql.com/doc/refman/5.6/en/timestamp-initialization .html
因此,如果该字段是表中的第一个时间戳(或日期时间),则命令
ALTER TABLE table_name MODIFY field_name timestamp(6) NOT NULL;
将起作用,但否则无效。显然,您可以更正该 alter 命令以包含默认值,但由于列重新排序而导致有效的查询停止工作这一事实让我很头疼。
Beyond the obvious performance tuning, I just ran into a corner case where reordering columns caused a (previously functional) sql script to fail.
From the documentation "TIMESTAMP and DATETIME columns have no automatic properties unless they are specified explicitly, with this exception: By default, the first TIMESTAMP column has both DEFAULT CURRENT_TIMESTAMP and ON UPDATE CURRENT_TIMESTAMP if neither is specified explicitly" https://dev.mysql.com/doc/refman/5.6/en/timestamp-initialization.html
So, a command
ALTER TABLE table_name MODIFY field_name timestamp(6) NOT NULL;
will work if that field is the first timestamp (or datetime) in a table, but not otherwise.Obviously, you can correct that alter command to include a default value, but the fact that a query that worked stopped working due to a column reordering made my head hurt.
通常情况下,最重要的因素是下一个必须在系统上工作的人。我尝试首先使用主键列,然后使用外键列,然后按照对系统的重要性/意义降序排列其余的列。
As is often the case, the biggest factor is the next guy who has to work on the system. I try to have the primary key columns first, the foreign key columns second, and then the rest of the columns in descending order of importance / significance to the system.
我能想到的唯一原因就是为了调试和救火。我们有一个表,其“名称”列出现在列表中的第 10 位左右。当您快速 select * from table where id in (1,2,3) 然后您必须滚动才能查看名称时,这是一种痛苦。
但仅此而已。
The only reason I can think about is for debugging and fire-fighting. We have a table whose "name" column's appears about 10th on the list. It's a pain when you do a quick select * from table where id in (1,2,3) and then you have to scroll across to look at the names.
But that's about it.
如果您要经常使用 UNION,如果您对列的顺序有约定,那么匹配列会更容易。
If you're going to be using UNION a lot, it makes matching columns easier if you have a convention about their ordering.
如前所述,存在许多潜在的性能问题。我曾经研究过一个数据库,如果您在查询中没有引用这些列,那么在末尾放置非常大的列可以提高性能。显然,如果一条记录跨越多个磁盘块,数据库引擎在获得所需的所有列后就可以停止读取块。
当然,任何性能影响不仅高度依赖于您所使用的制造商,而且还可能依赖于版本。几个月前,我注意到我们的 Postgres 无法使用索引进行“类似”比较。也就是说,如果你写了“somecolumn like 'M%'”,它就不够聪明,无法跳到M并在找到第一个N时退出。我计划更改一堆查询以使用“ Between”。然后我们得到了 Postgres 的新版本,它可以智能地处理类似的问题。很高兴我从来没有抽出时间来更改查询。显然这里不直接相关,但我的观点是,出于效率考虑所做的任何事情都可能在下一个版本中过时。
列顺序几乎总是与我非常相关,因为我经常编写读取数据库模式来创建屏幕的通用代码。就像,我的“编辑记录”屏幕几乎总是通过读取架构来获取字段列表,然后按顺序显示它们来构建。如果我更改列的顺序,我的程序仍然可以工作,但用户可能会觉得显示很奇怪。例如,您希望看到名称/地址/城市/州/邮政编码,而不是城市/地址/邮政编码/名称/州。当然,我可以将列的显示顺序放入代码或控制文件或其他内容中,但是每次我们添加或删除列时,我们都必须记住去更新控制文件。我喜欢把事情说一次。此外,当编辑屏幕纯粹从架构构建时,添加新表可能意味着编写零行代码来为其创建编辑屏幕,这非常酷。 (好吧,实际上,通常我必须在菜单中添加一个条目来调用通用编辑程序,并且我通常放弃通用的“选择要更新的记录”,因为有太多例外情况使其不实用.)
As noted, there are numerous potential performance issues. I once worked on a database where putting very large columns at the end improved performance if you didn't reference those columns in your query. Apparently if a record spanned multiple disk blocks, the database engine could stop reading blocks once it got all the columns it needed.
Of course any performance implications are highly dependent not just on the manufacturer that you're using, but also potentially on the version. A few months ago I noticed that our Postgres could not use an index for a "like" comparison. That is, if you wrote "somecolumn like 'M%'", it wasn't smart enough to skip to the M's and quit when it found the first N. I was planning to change a bunch of queries to use "between". Then we got a new version of Postgres and it handled the like's intelligently. Glad I never got around to changing the queries. Obviously not directly relevant here but my point is that anything you do for efficiency considerations could be obsolete with the next version.
Column order is almost always very relevant to me because I routinely write generic code that reads the database schema to create screens. Like, my "edit a record" screens are almost always built by reading the schema to get the list of fields, and then displaying them in order. If I changed the order of columns, my program would still work, but the display might be strange to the user. Like, you expect to see name / address / city / state / zip, not city / address / zip / name / state. Sure, I could put the display order of the columns in code or a control file or something, but then every time we added or removed a column we'd have to remember to go update the control file. I like to say things once. Also, when the edit screen is built purely from the schema, adding a new table can mean writing zero lines of code to create an edit screen for it, which is way cool. (Well, okay, in practice usually I have to add an entry to the menu to call the generic edit program, and I've generally given up on generic "select a record to update" because there are too many exceptions to make it practical.)
您唯一需要担心列顺序的情况是您的软件特别依赖于该顺序。通常,这是由于开发人员偷懒并执行了
select *
,然后在结果中通过索引而不是名称引用列。The only time you'll need to worry about column order is if your software specifically relies on that order. Typically this is due to the fact that the developer got lazy and did a
select *
and then referred to the columns by index rather than by name in their result.一般来说,当您通过 Management Studio 更改列顺序时,SQL Server 中会发生的情况是,它会创建一个具有新结构的临时表,将数据从旧表移动到该结构,删除旧表并重命名新表。正如您可能想象的那样,如果您有一个很大的表,这对于性能来说是一个非常糟糕的选择。我不知道 My SQL 是否也这样做,但这就是我们许多人避免对列重新排序的原因之一。由于 select * 不应该在生产系统中使用,因此对于设计良好的系统来说,在末尾添加列并不是问题。表中列的顺序一般不应打乱。
In general what happens in SQL Server when you change column order through Management Studio, is that it creates a temp table with the new structure, moves the data to that structure from the old table, drops the old table and renames the new one. As you might imagine, this is a very poor choice for performance if you have a large table. I don't know if My SQL does the same, but it is one reason why many of us avoid reordering columns. Since select * should never be used in a production system, adding columns at the end is not aproblem for a well-designed system. Order of columns inthe table should in genral not be messed with.