哪个更快/最好? SELECT * 或 SELECT 列 1、列 2、列 3 等

发布于 2024-07-04 21:51:31 字数 641 浏览 6 评论 0原文

我听说在编写 SQL 命令时使用 SELECT * 通常是不好的做法,因为它对于您特别需要的 SELECT 列更有效。

如果我需要SELECT表中的每一列,我应该使用

SELECT * FROM TABLE

SELECT column1, colum2, column3, etc. FROM TABLE

在这种情况下效率真的很重要吗? 如果您确实需要所有数据,我认为 SELECT * 在内部会更优化,但我是在没有真正了解数据库的情况下这么说的。

我很想知道这种情况下的最佳做法是什么。

更新:我可能应该指定,我真正想要执行SELECT *的唯一情况是当我从一个数据中选择数据时我知道即使添加新列,也始终需要检索所有列的表。

然而,考虑到我所看到的回复,这似乎仍然是一个坏主意,并且出于我曾经考虑过的更多技术原因,永远不应该使用 SELECT * 。

I've heard that SELECT * is generally bad practice to use when writing SQL commands because it is more efficient to SELECT columns you specifically need.

If I need to SELECT every column in a table, should I use

SELECT * FROM TABLE

or

SELECT column1, colum2, column3, etc. FROM TABLE

Does the efficiency really matter in this case? I'd think SELECT * would be more optimal internally if you really need all of the data, but I'm saying this with no real understanding of database.

I'm curious to know what the best practice is in this case.

UPDATE: I probably should specify that the only situation where I would really want to do a SELECT * is when I'm selecting data from one table where I know all columns will always need to be retrieved, even when new columns are added.

Given the responses I've seen however, this still seems like a bad idea and SELECT * should never be used for a lot more technical reasons that I ever though about.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(30

没企图 2024-07-11 21:51:31

绝对定义每次要选择的列。 没有理由不这样做,而且性能的提高是非常值得的。

他们不应该给出“SELECT *”的选项

Absolutely define the columns you want to SELECT every time. There is no reason not to and the performance improvement is well worth it.

They should never have given the option to "SELECT *"

温柔少女心 2024-07-11 21:51:31

如果您使用 * 或列,则 Select 的效率同样高(就速度而言)。

区别在于记忆,而不是速度。 当您选择多个列时,SQL Server 必须分配内存空间来为您的查询提供服务,包括您请求的所有列的所有数据,即使您只使用其中之一。

就性能而言,重要的是执行计划,而执行计划又在很大程度上取决于您的 WHERE 子句以及 JOIN、OUTER JOIN 等的数量...

对于您的问题,只需使用 SELECT * 即可。 如果您需要所有列,则性能没有差异。

Select is equally efficient (in terms of velocity) if you use * or columns.

The difference is about memory, not velocity. When you select several columns SQL Server must allocate memory space to serve you the query, including all data for all the columns that you've requested, even if you're only using one of them.

What does matter in terms of performance is the excecution plan which in turn depends heavily on your WHERE clause and the number of JOIN, OUTER JOIN, etc ...

For your question just use SELECT *. If you need all the columns there's no performance difference.

桃扇骨 2024-07-11 21:51:31

这取决于数据库服务器的版本,但现代版本的 SQL 可以以任何一种方式缓存计划。 我想说的是,使用最易于维护的数据访问代码。

It depends on the version of your DB server, but modern versions of SQL can cache the plan either way. I'd say go with whatever is most maintainable with your data access code.

儭儭莪哋寶赑 2024-07-11 21:51:31

最好准确说明所需列的原因之一是表结构未来可能发生变化。

如果您使用基于索引的方法手动读取数据,并使用查询结果填充数据结构,那么将来当您添加/删除列时,您将很难弄清楚出了什么问题。

至于什么更快,我会听取其他人的专业知识。

One reason it's better practice to spell out exactly which columns you want is because of possible future changes in the table structure.

If you are reading in data manually using an index based approach to populate a data structure with the results of your query, then in the future when you add/remove a column you will have headaches trying to figure out what went wrong.

As to what is faster, I'll defer to others for their expertise.

林空鹿饮溪 2024-07-11 21:51:31

与大多数问题一样,这取决于您想要实现的目标。 如果您想创建一个允许任何表中所有列的数据库网格,那么“选择 *”就是答案。 但是,如果您只需要某些列并且很少从查询中添加或删除列,那么请单独指定它们。

它还取决于您想要从服务器传输的数据量。 如果其中一列被定义为备忘录、图形、blob 等,并且您不需要该列,则最好不要使用“Select *”,否则您将获得一大堆您不需要的数据想要,你的表现可能会受到影响。

As with most problems, it depends on what you want to achieve. If you want to create a db grid that will allow all columns in any table, then "Select *" is the answer. However, if you will only need certain columns and adding or deleting columns from the query is done infrequently, then specify them individually.

It also depends on the amount of data you want to transfer from the server. If one of the columns is a defined as memo, graphic, blob, etc. and you don't need that column, you'd better not use "Select *" or you'll get a whole bunch of data you don't want and your performance could suffer.

成熟稳重的好男人 2024-07-11 21:51:31

补充一下其他人所说的,如果您选择的所有列都包含在索引中,则将从索引中提取结果集,而不是从 SQL 查找其他数据。

To add on to what everyone else has said, if all of your columns that you are selecting are included in an index, your result set will be pulled from the index instead of looking up additional data from SQL.

执着的年纪 2024-07-11 21:51:31

如果想要获取列数等元数据,则需要使用 SELECT * 。

SELECT * is necessary if one wants to obtain metadata such as the number of columns.

薄荷→糖丶微凉 2024-07-11 21:51:31

我会因此受到猛烈抨击,但我执行了 select *,因为几乎所有数据都是从 SQL Server 视图中检索的,这些视图将多个表中所需的值预先组合到一个易于访问的视图中。

然后,我确实希望视图中的所有列在将新字段添加到基础表时不会更改。 这样做的另一个好处是允许我更改数据的来源。 视图中的 FieldA 可能会被计算一次,然后我可能会将其更改为静态。 无论哪种方式,View 都会向我提供 FieldA。

这样做的好处在于它允许我的数据层获取数据集。 然后它将它们传递给我的 BL,然后 BL 可以用它们创建对象。 我的主应用程序只知道对象并与对象交互。 我什至允许我的对象在传递数据行时自行创建。

当然,我是唯一的开发人员,所以这也有帮助:)

Gonna get slammed for this, but I do a select * because almost all my data is retrived from SQL Server Views that precombine needed values from multiple tables into a single easy to access View.

I do then want all the columns from the view which won't change when new fields are added to underlying tables. This has the added benefit of allowing me to change where data comes from. FieldA in the View may at one time be calculated and then I may change it to be static. Either way the View supplies FieldA to me.

The beauty of this is that it allows my data layer to get datasets. It then passes them to my BL which can then create objects from them. My main app only knows and interacts with the objects. I even allow my objects to self-create when passed a datarow.

Of course, I'm the only developer, so that helps too :)

司马昭之心 2024-07-11 21:51:31

上面每个人都说过,加上:

如果您正在努力获得可读可维护的代码,请执行以下操作:

SELECT foo, bar FROM widgets;

立即可读并显示意图。 如果你拨打了这个电话,你就会知道你会得到什么。 如果小部件只有 foo 和 bar 列,那么选择 * 意味着您仍然需要考虑要返回的内容,确认顺序映射正确等。但是,如果小部件有更多列,但您只对 foo 感兴趣和栏,那么当您查询通配符然后只使用部分返回的内容时,您的代码就会变得混乱。

What everyone above said, plus:

If you're striving for readable maintainable code, doing something like:

SELECT foo, bar FROM widgets;

is instantly readable and shows intent. If you make that call you know what you're getting back. If widgets only has foo and bar columns, then selecting * means you still have to think about what you're getting back, confirm the order is mapped correctly, etc. However, if widgets has more columns but you're only interested in foo and bar, then your code gets messy when you query for a wildcard and then only use some of what's returned.

朕就是辣么酷 2024-07-11 21:51:31

请记住,如果根据定义您有内部联接,则不需要所有列,因为联接列中的数据是重复的。

在 SQl 服务器中列出列并不困难甚至耗时。 您只需将它们从对象浏览器中拖动过来(您可以通过从单词列中拖动来一次性获得所有内容)。 对系统造成永久性的性能影响(因为这可以减少索引的使用,并且通过网络发送不需要的数据成本高昂),并且随着数据库的更改,您更有可能遇到意外的问题(有时会添加列,例如,你不希望用户看到)仅仅为了节省不到一分钟的开发时间是短视和不专业的。

And remember if you have an inner join by definition you do not need all the columns as the data in the join columns is repeated.

It's not like listing columns in SQl server is hard or even time-consuming. You just drag them over from the object browser (you can get all in one go by dragging from the word columns). To put a permanent performance hit on your system (becasue this can reduce the use of indexes and becasue sending unneeded data over the network is costly) and make it more likely that you will have unexpected problems as the database changes (sometimes columns get added that you do not want the user to see for instance) just to save less than a minute of development time is short-sighted and unprofessional.

初心 2024-07-11 21:51:31

总之,至少在 PostgreSQL 中,选择带有和不带有 * 的所有列的性能几乎相同

PostgreSQL 中,我创建了 test,其中包含 10 id_x1000 万行,如下所示:

CREATE TABLE test AS SELECT generate_series(1, 10000000) AS id_1,
                            generate_series(1, 10000000) AS id_2,
                            generate_series(1, 10000000) AS id_3,
                            generate_series(1, 10000000) AS id_4,
                            generate_series(1, 10000000) AS id_5,
                            generate_series(1, 10000000) AS id_6,
                            generate_series(1, 10000000) AS id_7,
                            generate_series(1, 10000000) AS id_8,
                            generate_series(1, 10000000) AS id_9,
                            generate_series(1, 10000000) AS id_10;

然后,我交替运行了下面的 2 个查询,总共 20 次。 *每个查询总共运行 10 次

SELECT * FROM test:
SELECT id_1, id_2, id_3, id_4, id_5, id_6, id_7, id_8, id_9, id_10 FROM test;

<结果>

选择带 * 的所有列选择不带 * 的所有列
第一次运行12.792 秒12.483 秒
第二次运行12.803秒12.608 秒
第三次运行12.537 秒12.549 秒
第四次运行12.512 秒12.457 秒
第 5 次运行12.570 秒12.487 秒
第 6 次运行12.508 秒12.493 秒
第 7 次运行12.432 秒12.475 秒
<强>第8次运行12.532秒12.489秒
第9次运行12.532秒12.452秒< /strong>
第十次运行12.437 秒12.477 秒
平均12.565 秒12.497 秒

选择所有列的平均值:

  • 使用 * 为 12.565 秒
  • 没有*是12.497秒

In conclusion at least in PostgreSQL, the performance of selecting all columns with and without * is almost the same.

In PostgreSQL, I created test table with 10 id_x columns and 10 million rows as shown below:

CREATE TABLE test AS SELECT generate_series(1, 10000000) AS id_1,
                            generate_series(1, 10000000) AS id_2,
                            generate_series(1, 10000000) AS id_3,
                            generate_series(1, 10000000) AS id_4,
                            generate_series(1, 10000000) AS id_5,
                            generate_series(1, 10000000) AS id_6,
                            generate_series(1, 10000000) AS id_7,
                            generate_series(1, 10000000) AS id_8,
                            generate_series(1, 10000000) AS id_9,
                            generate_series(1, 10000000) AS id_10;

Then, I ran 2 queries below alternately 20 times in total. *Each query runs 10 times in total:

SELECT * FROM test:
SELECT id_1, id_2, id_3, id_4, id_5, id_6, id_7, id_8, id_9, id_10 FROM test;

<Result>

Select all columns with *Select all columns without *
1st run12.792 seconds12.483 seconds
2nd run12.803 seconds12.608 seconds
3rd run12.537 seconds12.549 seconds
4th run12.512 seconds12.457 seconds
5th run12.570 seconds12.487 seconds
6th run12.508 seconds12.493 seconds
7th run12.432 seconds12.475 seconds
8th run12.532 seconds12.489 seconds
9th run12.532 seconds12.452 seconds
10th run12.437 seconds12.477 seconds
Average12.565 seconds12.497 seconds

Average of selecting all columns:

  • with * is 12.565 seconds.
  • without * is 12.497 seconds.
ゃ人海孤独症 2024-07-11 21:51:31

命名您希望在应用程序中获得的每一列还可以确保您的应用程序在有人更改表时不会中断,只要您的列仍然存在(以任何顺序)。

Naming each column you expect to get in your application also ensures your application won't break if someone alters the table, as long as your columns are still present (in any order).

白首有我共你 2024-07-11 21:51:31

就性能而言,我看到评论说两者是平等的。 但可用性方面存在一些 + 和 -

当您在查询中使用 (select *) 时,如果有人更改表并添加前一个查询不需要的新字段,那么这是不必要的开销。 如果新添加的字段是 blob 或图像字段怎么办? 那么你的查询响应时间将会非常慢。

另一方面,如果您使用 (select col1,col2,..) 并且表被更改并添加了新字段,并且结果集中需要这些字段,则您始终需要在表更改后编辑选择查询。

但我建议始终在查询中使用 select col1,col2,... 并在表稍后更改时更改查询...

Performance wise I have seen comments that both are equal. but usability aspect there are some +'s and -'s

When you use a (select *) in a query and if some one alter the table and add new fields which do not need for the previous query it is an unnecessary overhead. And what if the newly added field is a blob or an image field??? your query response time is going to be really slow then.

In other hand if you use a (select col1,col2,..) and if the table get altered and added new fields and if those fields are needed in the result set, you always need to edit your select query after table alteration.

But I suggest always to use select col1,col2,... in your queries and alter the query if the table get altered later...

睫毛溺水了 2024-07-11 21:51:31

结果太大了。 SQL引擎生成结果并将其发送到客户端的速度很慢。

客户端作为通用编程环境,不是也不应该被设计为过滤和处理结果(例如 WHERE 子句、ORDER 子句),因为行数可能很大(例如数千万行)。

The result is too huge. It is slow to generate and send the result from the SQL engine to the client.

The client side, being a generic programming environment, is not and should not be designed to filter and process the results (e.g. the WHERE clause, ORDER clause), as the number of rows can be huge (e.g. tens of millions of rows).

燃情 2024-07-11 21:51:31

这是一个旧帖子,但仍然有效。 作为参考,我有一个非常复杂的查询,其中包括:

  • 12 个表
  • 6 个左联接
  • 9 个内联接
  • 所有 12 个表上总共 108 列
  • 我只需要 54 列
  • 4 列 Order By 子句

当我使用 Select * 执行查询时,需要平均为 2869 毫秒。
当我使用 Select 执行查询时,平均需要 1513 毫秒。

返回的总行数为 13,949。

毫无疑问,选择列名意味着比 Select * 更快的性能

This is an old post, but still valid. For reference, I have a very complicated query consisting of:

  • 12 tables
  • 6 Left joins
  • 9 inner joins
  • 108 total columns on all 12 tables
  • I only need 54 columns
  • A 4 column Order By clause

When I execute the query using Select *, it takes an average of 2869ms.
When I execute the query using Select , it takes an average of 1513ms.

Total rows returned is 13,949.

There is no doubt selecting column names means faster performance over Select *

故事未完 2024-07-11 21:51:31

当且仅当您需要获取所有字段的数据时,使用显式字段名称并不比使用 * 更快。

您的客户端软件不应该依赖于返回字段的顺序,所以这也是无稽之谈。

并且您可能(尽管不太可能)需要使用 * 获取所有字段,因为您还不知道存在哪些字段(想想非常动态的数据库结构)。

使用显式字段名称的另一个缺点是,如果字段名称很多且很长,那么读取代码和/或查询日志会变得更加困难。

所以规则应该是:如果您需要所有字段,请使用 *,如果您只需要一个子集,请明确命名它们。

It is NOT faster to use explicit field names versus *, if and only if, you need to get the data for all fields.

Your client software shouldn't depend on the order of the fields returned, so that's a nonsense too.

And it's possible (though unlikely) that you need to get all fields using * because you don't yet know what fields exist (think very dynamic database structure).

Another disadvantage of using explicit field names is that if there are many of them and they're long then it makes reading the code and/or the query log more difficult.

So the rule should be: if you need all the fields, use *, if you need only a subset, name them explicitly.

心病无药医 2024-07-11 21:51:31

您实际上应该只选择您需要的字段,并且只选择所需的数量,即

SELECT Field1, Field2 FROM SomeTable WHERE --(constraints)

在数据库之外,动态查询存在注入攻击和格式错误的数据的风​​险。 通常,您可以使用存储过程或参数化查询来解决这个问题。 另外(虽然并不是什么大问题),每次执行动态查询时,服务器都必须生成一个执行计划。

You should really be selecting only the fields you need, and only the required number, i.e.

SELECT Field1, Field2 FROM SomeTable WHERE --(constraints)

Outside of the database, dynamic queries run the risk of injection attacks and malformed data. Typically you get round this using stored procedures or parameterised queries. Also (although not really that much of a problem) the server has to generate an execution plan each time a dynamic query is executed.

是伱的 2024-07-11 21:51:31

虽然显式列出列有利于性能,但不要疯狂。

因此,如果您使用所有数据,为了简单起见,请尝试 SELECT *(想象一下有很多列并执行 JOIN... 查询可能会变得很糟糕)。 然后 - 测量。 与显式列出列名的查询进行比较。

不要推测性能,衡量它!

当您有一些包含大数据的列(例如帖子或文章的正文)并且在给定查询中不需要它时,显式列表最有帮助。 然后,通过不在答案中返回它,数据库服务器可以节省时间、带宽和磁盘吞吐量。 您的查询结果也会更小,这对任何查询缓存都有好处。

While explicitly listing columns is good for performance, don't get crazy.

So if you use all the data, try SELECT * for simplicity (imagine having many columns and doing a JOIN... query may get awful). Then - measure. Compare with query with column names listed explicitly.

Don't speculate about performance, measure it!

Explicit listing helps most when you have some column containing big data (like body of a post or article), and don't need it in given query. Then by not returning it in your answer DB server can save time, bandwidth, and disk throughput. Your query result will also be smaller, which is good for any query cache.

云淡月浅 2024-07-11 21:51:31

明确定义列,因为 SQL Server 不需要对列进行查找来提取它们。 如果您定义了列,则 SQL 可以跳过该步骤。

definitely defining the columns, because SQL Server will not have to do a lookup on the columns to pull them. If you define the columns, then SQL can skip that step.

内心激荡 2024-07-11 21:51:31

指定您需要的列总是更好,如果您考虑一下,SQL 不必在每次查询时都考虑“wtf is *”。 最重要的是,稍后有人可能会向表中添加您在查询中实际上不需要的列,在这种情况下,通过指定所有列会更好。

It's always better to specify the columns you need, if you think about it one time, SQL doesn't have to think "wtf is *" every time you query. On top of that, someone later may add columns to the table that you actually do not need in your query and you'll be better off in that case by specifying all of your columns.

我不吻晚风 2024-07-11 21:51:31

“select *”的问题是可能会带来您并不真正需要的数据。 在实际的数据库查询期间,所选列并不会真正添加到计算中。 真正“繁重”的是数据传输回客户端,任何您并不真正需要的列都只是浪费网络带宽并增加您等待查询返回的时间。

即使您确实使用了“select *...”带来的所有列,这也只是暂时的。 如果将来您更改表/视图布局并添加更多列,即使您不需要它们,您也会开始将它们带入您的选择中。

“select *”语句不好的另一点是视图创建。 如果您使用“select *”创建视图,然后将列添加到表中,则视图定义和返回的数据将不匹配,并且您需要重新编译视图才能使它们再次工作。

我知道编写“select *”很诱人,因为我真的不喜欢手动指定查询中的所有字段,但是当您的系统开始发展时,您会发现花费这些额外的时间是值得的/努力指定字段,而不是花费更多的时间和精力来消除视图上的错误或优化应用程序。

The problem with "select *" is the possibility of bringing data you don't really need. During the actual database query, the selected columns don't really add to the computation. What's really "heavy" is the data transport back to your client, and any column that you don't really need is just wasting network bandwidth and adding to the time you're waiting for you query to return.

Even if you do use all the columns brought from a "select *...", that's just for now. If in the future you change the table/view layout and add more columns, you'll start bring those in your selects even if you don't need them.

Another point in which a "select *" statement is bad is on view creation. If you create a view using "select *" and later add columns to your table, the view definition and the data returned won't match, and you'll need to recompile your views in order for them to work again.

I know that writing a "select *" is tempting, 'cause I really don't like to manually specify all the fields on my queries, but when your system start to evolve, you'll see that it's worth to spend this extra time/effort in specifying the fields rather than spending much more time and effort removing bugs on your views or optimizing your app.

阿楠 2024-07-11 21:51:31

到目前为止,这里已经回答了很多充分的理由,还有另一个尚未提及的理由。

明确命名列将帮助您进行后续维护。 在某些时候,您将要进行更改或排除故障,并发现自己在问“该列到底用在哪里”。

如果您已经明确列出了名称,那么通过所有存储过程、视图等查找对该列的每个引用就很简单。 只需为您的数据库模式转储一个 CREATE 脚本,然后通过它进行文本搜索。

Lots of good reasons answered here so far, here's another one that hasn't been mentioned.

Explicitly naming the columns will help you with maintenance down the road. At some point you're going to be making changes or troubleshooting, and find yourself asking "where the heck is that column used".

If you've got the names listed explicitly, then finding every reference to that column -- through all your stored procedures, views, etc -- is simple. Just dump a CREATE script for your DB schema, and text search through it.

沙与沫 2024-07-11 21:51:31

性能方面,具有特定列的 SELECT 可以更快(无需读入所有数据)。 如果您的查询确实使用了所有列,则仍然首选带有显式参数的 SELECT。 任何速度差异基本上都是不明显的并且接近恒定时间。 有一天你的模式将会改变,这是一个很好的保险,可以防止由此引起的问题。

Performance wise, SELECT with specific columns can be faster (no need to read in all the data). If your query really does use ALL the columns, SELECT with explicit parameters is still preferred. Any speed difference will be basically unnoticeable and near constant-time. One day your schema will change, and this is good insurance to prevent problems due to this.

原来分手还会想你 2024-07-11 21:51:31

即使查询不是通过网络发送的,SELECT * 也是一种不好的做法。

  1. 选择比您需要的更多的数据会使查询效率降低 - 服务器必须读取和传输额外的数据,因此需要时间并在系统上产生不必要的负载(正如其他人提到的,不仅是网络,还有磁盘、CPU 等。 )。 此外,服务器无法尽可能优化查询(例如,对查询使用覆盖索引)。
  2. 一段时间后,您的表结构可能会发生变化,因此 SELECT * 将返回一组不同的列。 因此,您的应用程序可能会获得意外结构的数据集并在下游某个地方发生故障。 明确声明列可以保证您获得已知结构的数据集,或者在数据库级别获得明确的错误(例如“未找到列”)。

当然,这一切对于一个小而简单的系统来说并不重要。

SELECT * is a bad practice even if the query is not sent over a network.

  1. Selecting more data than you need makes the query less efficient - the server has to read and transfer extra data, so it takes time and creates unnecessary load on the system (not only the network, as others mentioned, but also disk, CPU etc.). Additionally, the server is unable to optimize the query as well as it might (for example, use covering index for the query).
  2. After some time your table structure might change, so SELECT * will return a different set of columns. So, your application might get a dataset of unexpected structure and break somewhere downstream. Explicitly stating the columns guarantees that you either get a dataset of known structure, or get a clear error on the database level (like 'column not found').

Of course, all this doesn't matter much for a small and simple system.

相守太难 2024-07-11 21:51:31

指定列列表通常是最好的选择,因为如果有人向表中添加/插入列,您的应用程序不会受到影响。

Specifying the column list is usually the best option because your application won't be affected if someone adds/inserts a column to the table.

怀念你的温柔 2024-07-11 21:51:31

对于服务器来说,指定列名肯定更快。 但是,如果

  1. 性能不是一个大问题(例如,这是一个网站内容数据库,每个表中有数百行,也许数千行,但不是数百万行); 并且
  2. 您的工作是使用通用框架创建许多小型、类似的应用程序(例如面向公众的内容管理网站),而不是创建复杂的一次性应用程序; 并且
  3. 灵活性很重要(每个站点的数据库架构都有大量自定义);

那么你最好坚持使用 SELECT * 。 在我们的框架中,大量使用 SELECT * 使我们能够向表中引入新的网站管理内容字段,从而赋予其 CMS 的所有优势(版本控制、工作流程/批准等),同时仅在某个位置接触代码几个点,而不是几十个点。

我知道数据库专家会因此而恨我 - 来吧,投票否决我 - 但在我的世界中,开发人员时间稀缺,而 CPU 周期充足,因此我相应地调整我节省的内容和浪费的内容。

Specifying column names is definitely faster - for the server. But if

  1. performance is not a big issue (for example, this is a website content database with hundreds, maybe thousands - but not millions - of rows in each table); AND
  2. your job is to create many small, similar applications (e.g. public-facing content-managed websites) using a common framework, rather than creating a complex one-off application; AND
  3. flexibility is important (lots of customization of the db schema for each site);

then you're better off sticking with SELECT *. In our framework, heavy use of SELECT * allows us to introduce a new website managed content field to a table, giving it all of the benefits of the CMS (versioning, workflow/approvals, etc.), while only touching the code at a couple of points, instead of a couple dozen points.

I know the DB gurus are going to hate me for this - go ahead, vote me down - but in my world, developer time is scarce and CPU cycles are abundant, so I adjust accordingly what I conserve and what I waste.

夜血缘 2024-07-11 21:51:31

您应该只选择您需要的列。 即使您需要所有列,最好还是列出列名,这样 sql server 就不必查询系统表中的列。

此外,如果有人向表中添加列,您的应用程序可能会崩溃。 您的程序也会获得它没有预料到的列,并且它可能不知道如何处理它们。

除此之外,如果表有二进制列,那么查询会慢得多并且使用更多的网络资源。

You should only select the columns that you need. Even if you need all columns it's still better to list column names so that the sql server does not have to query system table for columns.

Also, your application might break if someone adds columns to the table. Your program will get columns it didn't expect too and it might not know how to process them.

Apart from this if the table has a binary column then the query will be much more slower and use more network resources.

韬韬不绝 2024-07-11 21:51:31

select * 是一件坏事有四个重要原因:

  1. 最重要的实际原因是它迫使用户神奇地知道返回列的顺序。 最好是明确的,这也可以保护您免受表更改的影响,这很好地进入...

  2. 如果您使用的列名发生更改,最好尽早捕获它(在 SQL 调用时)而不是当您尝试使用不再存在的列(或已更改名称等)时。

  3. 列出列名称使您的代码更加自记录,因此可能更具可读性。

  4. 如果您通过网络传输(或者即使您不是),则不需要的列只是浪费。

There are four big reasons that select * is a bad thing:

  1. The most significant practical reason is that it forces the user to magically know the order in which columns will be returned. It's better to be explicit, which also protects you against the table changing, which segues nicely into...

  2. If a column name you're using changes, it's better to catch it early (at the point of the SQL call) rather than when you're trying to use the column that no longer exists (or has had its name changed, etc.)

  3. Listing the column names makes your code far more self-documented, and so probably more readable.

  4. If you're transferring over a network (or even if you aren't), columns you don't need are just waste.

节枝 2024-07-11 21:51:31

选择特定列更好的原因之一是它提高了 SQL Server 从索引访问数据而不是查询表数据的可能性。

这是我写的一篇关于它的文章: 选择查询索引覆盖率较差的真正原因

它也不太容易更改,因为任何使用数据的代码都将获得相同的数据结构无论您将来对表架构进行什么更改。

One reason that selecting specific columns is better is that it raises the probability that SQL Server can access the data from indexes rather than querying the table data.

Here's a post I wrote about it: The real reason select queries are bad index coverage

It's also less fragile to change, since any code that consumes the data will be getting the same data structure regardless of changes you make to the table schema in the future.

℡寂寞咖啡 2024-07-11 21:51:31

鉴于指定您选择所有列,此时几乎没有什么区别。 然而,要意识到数据库模式确实会发生变化。 如果您使用SELECT *,您将获得添加到表中的任何新列,即使您的代码很可能不准备使用或呈现该新数据。 这意味着您的系统将面临意外的性能和功能变化。

您可能愿意将其视为次要成本,但要意识到您不需要的列仍然必须是:

  1. 从数据库读取
  2. 通过网络发送
  3. 编组到您的流程中
  4. (对于 ADO 类型技术) 保存在数据表中内存中
  5. 被忽略和丢弃/垃圾收集的

项目 #1 有许多隐藏成本,包括消除一些潜在的覆盖索引、导致数据页加载(和服务器缓存抖动)、产生本来可以避免的行/页/表锁。

将此与指定列与 * 的潜在节省进行权衡,唯一的潜在节省是:

  1. 程序员不需要重新访问 SQL 来添加列
  2. 网络传输SQL 更小/更快
  3. SQL Server 查询解析/验证时间
  4. SQL Server 查询计划缓存

对于第 1 项,实际情况是您将添加/更改代码以使用您可能添加的任何新列,因此它是洗。

对于第 2 项,差异很少足以让您采用不同的数据包大小或网络数据包数量。 如果您发现 SQL 语句传输时间是主要问题,那么您可能需要首先降低语句速率。

对于第 3 项,没有任何节省,因为 * 的扩展无论如何都必须发生,这意味着无论如何都要查询表架构。 实际上,列出列将产生相同的成本,因为它们必须根据架构进行验证。 换句话说,这是一次彻底的清洗。

对于第 4 项,当您指定特定列时,您的查询计划缓存可能会变得更大,但如果您正在处理不同的列集(这不是您指定的列)。 在这种情况下,您确实需要不同的缓存条目,因为您需要根据需要使用不同的计划。

因此,由于您指定问题的方式,这一切都归结为面对最终模式修改时的问题弹性。 如果您将此模式刻录到 ROM 中(确实如此),那么 * 是完全可以接受的。

然而,我的一般准则是您应该只选择您需要的列,这意味着有时看起来您正在要求所有这些列,但 DBA 和模式演变意味着一些新列可能会出现。出现这可能会极大地影响查询。

我的建议是您应该始终选择特定列。 请记住,你会一次又一次地擅长所做的事情,所以只要养成正确做事的习惯即可。

如果您想知道为什么架构可能会在不更改代码的情况下发生更改,请考虑审核日志记录、有效/到期日期以及 DBA 系统添加的其他类似内容,以解决合规性问题。 不正当更改的另一个来源是系统或用户定义字段中其他地方的性能非规范化。

Given your specification that you are selecting all columns, there is little difference at this time. Realize, however, that database schemas do change. If you use SELECT * you are going to get any new columns added to the table, even though in all likelihood, your code is not prepared to use or present that new data. This means that you are exposing your system to unexpected performance and functionality changes.

You may be willing to dismiss this as a minor cost, but realize that columns that you don't need still must be:

  1. Read from database
  2. Sent across the network
  3. Marshalled into your process
  4. (for ADO-type technologies) Saved in a data-table in-memory
  5. Ignored and discarded / garbage-collected

Item #1 has many hidden costs including eliminating some potential covering index, causing data-page loads (and server cache thrashing), incurring row / page / table locks that might be otherwise avoided.

Balance this against the potential savings of specifying the columns versus an * and the only potential savings are:

  1. Programmer doesn't need to revisit the SQL to add columns
  2. The network-transport of the SQL is smaller / faster
  3. SQL Server query parse / validation time
  4. SQL Server query plan cache

For item 1, the reality is that you're going to add / change code to use any new column you might add anyway, so it is a wash.

For item 2, the difference is rarely enough to push you into a different packet-size or number of network packets. If you get to the point where SQL statement transmission time is the predominant issue, you probably need to reduce the rate of statements first.

For item 3, there is NO savings as the expansion of the * has to happen anyway, which means consulting the table(s) schema anyway. Realistically, listing the columns will incur the same cost because they have to be validated against the schema. In other words this is a complete wash.

For item 4, when you specify specific columns, your query plan cache could get larger but only if you are dealing with different sets of columns (which is not what you've specified). In this case, you do want different cache entries because you want different plans as needed.

So, this all comes down, because of the way you specified the question, to the issue resiliency in the face of eventual schema modifications. If you're burning this schema into ROM (it happens), then an * is perfectly acceptable.

However, my general guideline is that you should only select the columns you need, which means that sometimes it will look like you are asking for all of them, but DBAs and schema evolution mean that some new columns might appear that could greatly affect the query.

My advice is that you should ALWAYS SELECT specific columns. Remember that you get good at what you do over and over, so just get in the habit of doing it right.

If you are wondering why a schema might change without code changing, think in terms of audit logging, effective/expiration dates and other similar things that get added by DBAs for systemically for compliance issues. Another source of underhanded changes is denormalizations for performance elsewhere in the system or user-defined fields.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文