select * 的使用是否合理?
我一直向我的开发人员宣讲 SELECT *
是邪恶的,应该像瘟疫一样避免。
有没有什么情况可以证明它是合理的?
我不是在谈论 COUNT(*)
- 大多数优化器都可以弄清楚。
编辑
我说的是生产代码。
我看到的关于这种不良做法的一个很好的例子是一个遗留的 asp 应用程序,它在存储过程中使用 select *
,并使用 ADO
循环返回的记录,但得到了按索引列出的列。您可以想象当将新字段添加到字段列表末尾以外的其他位置时会发生什么。
I've always preached to my developers that SELECT *
is evil and should be avoided like the plague.
Are there any cases where it can be justified?
I'm not talking about COUNT(*)
- which most optimizers can figure out.
Edit
I'm talking about production code.
And one great example I saw of this bad practice was a legacy asp application that used select *
in a stored procedure, and used ADO
to loop through the returned records, but got the columns by index. You can imagine what happened when a new field was added somewhere other than the end of the field list.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(20)
我很高兴在审核触发器中使用
*
。在这种情况下,它实际上可以证明是有好处的,因为它将确保如果将附加列添加到基表中,它将引发错误,因此不会忘记在审计触发器和/或审计表结构中处理此问题。
(就像dotjoe)我也很高兴在派生表中使用它列表表达式。尽管我习惯于反过来做。
我对 SQL Server 最熟悉,至少优化器可以毫无问题地认识到只需要列
a,b,c
以及在内表表达式不会导致检索和丢弃不需要的列的任何不必要的开销。原则上,
SELECT *
在视图中应该没问题,而且它是视图中的最终SELECT
,应该避免它,但在 SQL Server 中这可能会导致问题因为它存储视图的列元数据,当基础表更改时不会自动更新,并且使用*
可能会导致混乱和不正确的结果,除非运行sp_refreshview
来更新它元数据。I'm quite happy using
*
in audit triggers.In that case it can actually prove a benefit because it will ensure that if additional columns are added to the base table it will raise an error so it cannot be forgotten to deal with this in the audit trigger and/or audit table structure.
(Like dotjoe) I am also happy using it in derived tables and column table expressions. Though I habitually do it the other way round.
I'm mostly familiar with SQL Server and there at least the optimiser has no problem recognising that only columns
a,b,c
will be required and the use of*
in the inner table expression does not cause any unnecessary overhead retrieving and discarding unneeded columns.In principle
SELECT *
ought to be fine in a view as well as it is the finalSELECT
from the view where it ought to be avoided however in SQL Server this can cause problems as it stores column metadata for views which is not automatically updated when the underlying tables change and the use of*
can lead to confusing and incorrect results unlesssp_refreshview
is run to update this metadata.在很多场景下 SELECT * 是最优解决方案。在 Management Studio 中运行临时查询只是为了了解您正在使用的数据。查询您尚不知道列名的表,因为这是您第一次使用新架构。构建一次性快速工具来执行一次性迁移或数据导出。
我同意,在“正确的”开发中,您应该避免它 - 但在很多情况下,“正确的”开发不一定是业务问题的最佳解决方案。规则和最佳实践很棒,只要您知道何时打破它们。 :)
There are many scenarios where SELECT * is the optimal solution. Running ad-hoc queries in Management Studio just to get a sense of the data you're working with. Querying tables where you don't know the column names yet because it's the first time you've worked with a new schema. Building disposable quick'n'dirty tools to do a one-time migration or data export.
I'd agree that in "proper" development, you should avoid it - but there's lots of scenarios where "proper" development isn't necessarily the optimum solution to a business problem. Rules and best practices are great, as long as you know when to break them. :)
当使用 CTE 时,我将在生产中使用它。但是,在本例中,它并不是真正的
select *
,因为我已经在 CTE 中指定了列。我只是不想在最终选择中重新指定。I'll use it in production when working with CTEs. But, in this case it's not really
select *
, because I already specified the columns in the CTE. I just don't want to respecify in the final select.如果您谈论的是实时代码,我想不到。
人们说这使得添加列更容易开发(因此它们会自动返回并且可以在不更改存储过程的情况下使用),但他们不知道如何编写最佳代码/sql。
我只在编写不会被重用的临时查询时使用它(找出表的结构,当我不确定列名是什么时获取一些数据)。
None that I can think of, if you are talking about live code.
People saying that it makes adding columns easier to develop (so they automatically get returned and can be used without changing the Stored procedure) have no idea about writing optimal code/sql.
I only ever use it when writing ad-hoc queries that will not get reused (finding out the structure of a table, getting some data when I am not sure what the column names are).
我认为在
exists
子句中使用select *
是合适的:有些人喜欢在这种情况下使用
select 1
,但它并不优雅,而且它不会带来任何性能改进(早期优化再次出现)。I think using
select *
in anexists
clause is appropriate:Some people like to use
select 1
in this case, but it's not elegant, and it doesn't buy any performance improvements (early optimization strikes again).在生产代码中,我倾向于 100% 同意你的观点。
然而,我认为在执行即席查询时,* 不仅仅证明了它的存在。
In production code, I'd tend to agree 100% with you.
However, I think that the * more than justifies its existence when performing ad-hoc queries.
你的问题已经得到了很多答案,但你似乎忽略了所有那些没有鹦鹉学舌地回应你想听到的内容的答案。不过,这是第三次(到目前为止):有时没有瓶颈。有时,性能远胜于性能。有时,表在不断变化,修改每个 SELECT 查询只是需要管理的又一点可能的不一致。有时您必须按照不可能的时间表交付,这是您最不需要考虑的事情。
如果您生活在子弹时间中,当然,请输入所有列名称。但为什么要停在那里呢?在无模式 dbms 中重写您的应用程序。见鬼,用汇编语言编写您自己的 dbms。这真的会让他们看到。
You've gotten a number of answers to your question, but you seem to be dismissing everything that isn't parroting back what you want to hear. Still, here it is for the third (so far) time: sometimes there is no bottleneck. Sometimes performance is way better than fine. Sometimes the tables are in flux, and amending every SELECT query is just one more bit of possible inconsistency to manage. Sometimes you've got to deliver on an impossible schedule and this is the last thing you need to think about.
If you live in bullet time, sure, type in all the column names. But why stop there? Re-write your app in a schema-less dbms. Hell, write your own dbms in assembly. That'd really show 'em.
请记住,如果您使用 select * 并且您有一个联接,则至少一个字段将被发送两次(联接字段)。这样无端浪费了数据库资源和网络资源。
And remember if you use select * and you have a join at least one field will be sent twice (the join field). This wastes database resources and network resources for no reason.
作为一种工具,我使用它来快速刷新我的记忆,了解我可以从查询中返回什么。作为生产级别的查询本身..没办法。
As a tool I use it to quickly refresh my memory as to what I can possibly get back from a query. As a production level query itself .. no way.
我想,当创建处理数据库的应用程序(例如 phpmyadmin)时,并且您处于显示完整表格的页面中,在这种情况下使用 SELECT * 是合理的。
When creating an application that deals with the database, like phpmyadmin, and you are in a page where to display a full table, in that case using
SELECT *
can be justified, I guess.我唯一能想到的就是开发一个针对任何数据库运行的实用程序或 SQL 工具应用程序时。但即使在这里,我也倾向于查询系统表以获取表结构,然后从中构建任何必要的查询。
我的团队最近在一个地方使用了 SELECT * ,我认为这还可以......我们有一个数据库作为另一个数据库的外观而存在(称之为 DB_Data),所以它主要是由针对另一个数据库中的表的视图组成。当我们生成视图时,我们实际上生成了列列表,但是 DB_Data 数据库中有一组视图,它们是在将行添加到通用查找表时自动生成的(此设计在我到达这里之前就已就位)。我们编写了一个 DDL 触发器,以便当此过程在 DB_Data 中创建视图时,会在外观中自动创建另一个视图。由于生成的视图始终与 DB_Data 中的视图完全匹配,并且始终刷新并保持同步,因此为了简单起见,我们仅使用 SELECT * 。
如果大多数开发人员在其整个职业生涯中都没有在生产代码中合法使用 SELECT * ,我不会感到惊讶。
About the only thing that I can think of would be when developing a utility or SQL tool application that is being written to run against any database. Even here though, I would tend to query the system tables to get the table structure and then build any necessary query from that.
There was one recent place where my team used
SELECT *
and I think that it was ok... we have a database that exists as a facade against another database (call it DB_Data), so it is primarily made up of views against the tables in the other database. When we generate the views we actually generate the column lists, but there is one set of views in the DB_Data database that are automatically generated as rows are added to a generic look-up table (this design was in place before I got here). We wrote a DDL trigger so that when a view is created in DB_Data by this process then another view is automatically created in the facade. Since the view is always generated to exactly match the view in DB_Data and is always refreshed and kept in sync, we just usedSELECT *
for simplicity.I wouldn't be surprised if most developers went their entire career without having a legitimate use for
SELECT *
in production code though.我使用 select * 来查询针对读取而优化的表(非规范化、平面数据)。非常有利,因为表格的目的只是支持应用程序中的各种视图。
I've used select * to query tables optimized for reading (denormalized, flat data). Very advantageous since the purpose of the tables were simply to support various views in the application.
phpmyadmin
的开发人员如何确保他们显示您的数据库表的所有字段?How else do the developers of
phpmyadmin
ensure they are displaying all the fields of your DB tables?可以想象,您希望设计数据库和应用程序,以便可以向表中添加列,而无需重写应用程序。如果您的应用程序至少检查列名称,则它可以安全地使用 SELECT * 并使用一些适当的默认操作处理其他列。当然,应用程序可以查询系统目录(或特定于应用程序的目录)以获取列信息,但在某些情况下,
SELECT *
是执行此操作的语法糖。然而,这样做存在明显的风险,并且向应用程序添加所需的逻辑以使其可靠很可能意味着在不太合适的介质中复制数据库的查询检查。我不会猜测现实生活中成本和收益如何权衡。
在实践中,我坚持使用
SELECT *
3 种情况(其他答案中提到了一些:EXISTS 的内容
谓词It is conceivable you'd want to design your DB and application so that you can add a column to a table without needing to rewrite your application. If your application at least checks column names it can safely use
SELECT *
and treat additional columns with some appropriate default action. Sure the app could consult system catalogs (or app-specific catalogs) for column information, but in some circumstancesSELECT *
is syntactic sugar for doing that.There are obvious risks to this, however, and adding the required logic to the app to make it reliable could well simply mean replicating the DB's query checks in a less suitable medium. I am not going to speculate on how the costs and benefits trade off in real life.
In practice, I stick to
SELECT *
for 3 cases (some mentioned in other answers:EXISTS
predicate.是的,但仅在意图实际从表中获取所有列而不是因为您想要表当前拥有的所有列的情况下。
例如,在我开发的一个系统中,我们有 UDF(用户定义字段),用户可以在报告中选择他们想要的字段、顺序以及过滤。构建结果集时,从我正在构建的临时表中简单地“选择 *”更有意义,而不必跟踪哪些列处于活动状态。
Yes, but only in situations where the intention is to actually get all the columns from a table not because you want all the columns that a table currently has.
For example, in one system that I worked on we had UDFs (User Defined Fields) where the user could pick the fields they wanted on the report, the order as well as filtering. When building a result set it made more sense to simply "select *" from the temporary tables that I was building instead of having to keep track of which columns were active.
我有几次需要显示列名未知的表中的数据。因此,我执行了
SELECT *
并在运行时获取了列名称。我收到了一个旧版应用程序,其中一个表有 200 列,一个视图有 300 列。
SELECT *
带来的风险并不比显式列出所有 300 列更糟糕。I have several times needed to display data from a table whose column names were unknown. So I did
SELECT *
and got the column names at run time.I was handed a legacy app where a table had 200 columns and a view had 300. The risk exposure from
SELECT *
would have been no worse than from listing all 300 columns explicitly.取决于生产软件的上下文。
如果您正在为表管理工具编写一个简单的数据访问层,其中用户将选择表并在网格中查看结果,那么 *SELECT ** 似乎没问题。
换句话说,如果您选择通过其他方式处理“字段选择”(如检索结果集后在自动或用户指定的过滤器中),那么看起来就很好。
另一方面,如果我们谈论的是某种具有业务规则、定义的模式等的企业软件……那么我同意 *SELECT ** 是一个坏主意。
编辑:哦,当源表是触发器或视图的存储过程时,“*SELECT **”应该没问题,因为您通过其他方式(视图的定义或存储过程的结果集)管理结果集。
Depends on the context of the production software.
If you are writing a simple data access layer for a table management tool where the user will be selecting tables and viewing results in a grid, then it would seem *SELECT ** is fine.
In other words, if you choose to handle "selection of fields" through some other means (as in automatic or user-specified filters after retrieving the resultset) then it seems just fine.
If on the other hand we are talking about some sort of enterprise software with business rules, a defined schema, etc. ... then I agree that *SELECT ** is a bad idea.
EDIT: Oh and when the source table is a stored procedure for a trigger or view, "*SELECT **" should be fine because you're managing the resultset through other means (the view's definition or the stored proc's resultset).
在生产代码中
Select *
在任何时候都是合理的:为什么我需要返回并不得不担心更改相关存储过程的开销,每个我该在表中添加一个字段吗?
为什么我什至要考虑我是否选择了正确的领域,而绝大多数时候我无论如何都想要其中的大部分,而绝大多数时候我不想要,其他的东西是瓶颈吗?
如果我有一个特定的性能问题,那么我会回去修复它。否则,在我的环境中,这只是我可以不做的过早(且昂贵)的优化。
编辑..在讨论之后,我想我会补充一点:
...以及人们没有做过其他不良事情的地方,例如尝试访问列(i),无论如何,这可能会在其他情况下中断:)
Select *
in production code is justifiable any time that:Why would I want the overhead of going back and having to worry about changing the relevant stored procedures, every time I add a field to the table?
Why would I even want to have to think about whether or not I've selected the right fields, when the vast majority of the time I want most of them anyway, and the vast majority of the few times I don't, something else is the bottleneck?
If I have a specific performance issue then I'll go back and fix that. Otherwise in my environment, it's just premature (and expensive) optimisation that I can do without.
Edit.. following the discussion, I guess I'd add to this:
... and where people haven't done other undesirable things like tried to access columns(i), which could break in other situations anyway :)
我知道我参加聚会已经很晚了,但我会补充说,只要我知道我总是想要所有列,无论列名称如何,我都会使用 select * 。这可能是一个相当边缘的情况,但在数据仓库中,我可能想从第 3 方应用程序暂存整个表。我的标准流程是删除临时表并运行
是的,如果远程表上的架构发生更改,下游依赖项可能会引发错误,但无论如何都会发生这种情况。
I know I'm very late to the party but I'll chip in that I use select * whenever I know that I'll always want all columns regardless of the column names. This may be a rather fringe case but in data warehousing, I might want to stage an entire table from a 3rd party app. My standard process for this is to drop the staging table and run
Yes, if the schema on the remote table changes, downstream dependencies may throw errors but that's going to happen regardless.
如果您想查找所有列并希望顺序,您可以执行以下操作(至少如果您使用 MySQL):
SHOW COLUMNS FROM mytable FROM mydb;
(1)您可以看到有关的所有相关信息你所有的领域。您可以防止类型问题,并且可以确定所有列名称。这个命令非常快,因为你只要求表的结构。从结果中,您将选择所有名称并构建一个如下所示的字符串:
如果您不想运行两个单独的 MySQL 命令,因为 MySQL 命令很昂贵,您可以将 (1) 和 (2) 包含到存储过程中它将结果作为 OUT 参数,这样您只需调用存储过程,并且每个命令和数据生成都将在数据库服务器上发生。
If you want to find all the columns and want order, you can do the following (at least if you use MySQL):
SHOW COLUMNS FROM mytable FROM mydb;
(1)You can see every relevant information about all your fields. You can prevent problems with types and you can know for sure all the column names. This command is very quick, because you just ask for the structure of the table. From the results you will select all the name and will build a string like this:
If you don't want to run two separate MySQL commands because a MySQL command is expensive, you can include (1) and (2) into a stored procedure which will have the results as an OUT parameter, that way you will just call a stored procedure and every command and data generation will happen at the database server.