所以我有一个表:
CREATE TABLE TABLE_NAME (
COLUMN_1 char(12) NOT NULL,
COLUMN_2 char(2) NOT NULL,
COLUMN_3 char(1) NOT NULL,
COLUMN_4 int NOT NULL,
COLUMN_5 char(2) NOT NULL,
COLUMN_6 money NOT NULL,
COLUMN_7 smallint NOT NULL,
COLUMN_8 varchar(10) NOT NULL,
COLUMN_9 smallint NOT NULL,
COLUMN_10 datetime NOT NULL
Primary Key (COLUMN_1, COLUMN_2, COLUMN_3)
)
SELECT COUNT(*)
返回的值与 SELECT DISTINCT COUNT(*)
不同。这怎么可能?
我还尝试了
SELECT COUNT(*) FROM (
SELECT
COLUMN_1,
COLUMN_2,
COLUMN_3,
COLUMN_4,
COLUMN_5,
COLUMN_6,
COLUMN_7,
COLUMN_8,
COLUMN_9,
COLUMN_10
FROM TABLE_NAME
) TMP
返回与不同查询相同的计数。
我看不出主键和所有字段都不为空的情况下,总计数可能与唯一记录的计数不同。
顺便说一句,这是在 Sybase ASE 15 上进行的。
五十万条记录中存在大约一百条记录的差异。我还在其他几个表中看到了这个问题,但只选择了一个作为示例。
编辑
为了完整起见,我应该提到,我在编写一个简单的作业以将该表完全复制到远程数据库时发现了这个问题。我的应用程序正在记录一定数量的读/写操作,但由于源数据库中的记录数与目标数据库中的记录数不同而未能通过 QA。这两个值都是通过 COUNT(*)
获得的;从目标(Oracle 10g)返回的计数与我的应用程序记录的读/写操作数相同。由于源表上的所有字段都定义为 NOT NULL 并且定义了主键,因此我无法解释我的应用程序如何丢失少量记录。
这是我开始使用上面列出的备用查询的时候,这两个查询都与我的应用程序读/写计数以及从目标返回的 COUNT(*)
值一致。换句话说,唯一不匹配的值是源数据库上的 COUNT(*)
。
So I have a table:
CREATE TABLE TABLE_NAME (
COLUMN_1 char(12) NOT NULL,
COLUMN_2 char(2) NOT NULL,
COLUMN_3 char(1) NOT NULL,
COLUMN_4 int NOT NULL,
COLUMN_5 char(2) NOT NULL,
COLUMN_6 money NOT NULL,
COLUMN_7 smallint NOT NULL,
COLUMN_8 varchar(10) NOT NULL,
COLUMN_9 smallint NOT NULL,
COLUMN_10 datetime NOT NULL
Primary Key (COLUMN_1, COLUMN_2, COLUMN_3)
)
SELECT COUNT(*)
returns a different value than SELECT DISTINCT COUNT(*)
. How can this be possible?
I also tried
SELECT COUNT(*) FROM (
SELECT
COLUMN_1,
COLUMN_2,
COLUMN_3,
COLUMN_4,
COLUMN_5,
COLUMN_6,
COLUMN_7,
COLUMN_8,
COLUMN_9,
COLUMN_10
FROM TABLE_NAME
) TMP
which returned the same count as the distinct query.
I can't see how with a primary key and all fields being NOT NULL, there can be a different total count than the count of unique records.
BTW, this is on Sybase ASE 15.
The discrepancy is a hundred or so records out of a half million. I'm also seeing this problem in several other tables, but chose just one for the example.
Edit
I should mention for the sake of completeness that I discovered this problem when writing a simple job to completely copy this table to a remote database. My application was recording a certain number of read/write operations, but failed QA because the number of records in the source database differed from the number of records in the target database. Both values were obtained via COUNT(*)
; the count returned from the target (Oracle 10g) was the same as the number of read/write operations recorded by my app. As all fields on the source table are defined NOT NULL and a primary key is defined, I was at a loss to explain how my application was losing a tiny number of records.
This is when I started using the alternate queries listed above, both of which agreed with my apps read/write count, as well as the COUNT(*)
value returned from the target. In other words, the only value that did not match was the COUNT(*)
on the source database.
发布评论
评论(3)
在大多数支持它的数据库中,
count(*)
实际上并不检索所有记录并对其进行计数,而是获取一些仅跟踪行数(或近似行数)的元数据字段目前存储在表中。另一方面,当您执行需要处理实际数据的操作时,dbms 无论如何都会获取行,并且会按照您的预期对它们进行计数。当然,可以合理地预期,无论如何实现,
count(*)
的结果都将与更复杂但等效的查询相同。那么这表明(也许)您的表的元数据已以某种方式损坏。 (我想说这是一个不错的选择——我对 sybase 不太熟悉,但大多数 dbms 有一种方法可以强制重建表指标......这可能值得一试)。另一种可能的解释是数据库的内部表行计数器实际上并不是设计为 100% 准确的。 (第二种可能性纯粹是有根据的猜测......我实际上不知道Sybase的行计数器是否属实,但可能值得进一步调查)。
In most databases that support it,
count(*)
doesn't actually retrieve all records and count them -- instead it fetches some metadata field that just tracks the number of rows (or approximate number of rows) presently stored in the table. On the other hand, when you do something that requires working with actual data, the dbms is going to fetch the rows anyway, and it will count them as you would expect it to.Of course, it's reasonable to expect that, regardless of how it's implemented, the result of
count(*)
would be the same as more a complex but equivalent query. That would suggest then, that (maybe) your table's metadata is corrupted somehow. (I'd say this one is a good bet -- I'm not familiar with sybase specifically, but most dbms have a way to force rebuild the table metrics... that might be worth a try here).Another possible explanation is that the database's internal table row counter is actually not designed to be 100% accurate. (this second possibility is pure educated speculation... I don't actually know whether this is true of Sybase's row counter or not, but it might be worth further investigation).
这是 Sybase ASE 对 Sybase ASE 问题的回答
它适用于任何标准 SQL 兼容的 DBMS;它不适用于其他人。
您选择的答案不正确。
COUNT()
是 ISO/IEC/ANSI 标准 SQL 表达式。需要物理地计算行数,这意味着在大表上会很慢。要求不使用内存驻留表或目录表(“元数据”)。这是一个运行时值,因此如果表处于活动状态,它将在每次执行时不断更改括号内放置的内容非常重要。
COUNT(*)
返回包含空值的行数COUNT(column)
返回column
不为 Null 的行数< /p>是的,
DISTINCT
的位置也很重要。这迫使使用工作台。如果您需要计算不唯一的列,那么您别无选择;但对于唯一列,无需使用DISTINCT
DISTINCT
适用于列或从列派生的表达式;计数(不同 *)
毫无意义(“区分所有列”),并且 ASE 15 有一个显着改进的解析器,它可以捕获此类内容(以前的版本将返回不太准确的错误消息)。
实际读取的行数取决于您的隔离级别(将为指定级别返回正确的计数)和数据库上的当前活动
最干净的方法,可以避免您得到的奇怪结果,是使用
COUNT(PK_column)
(因为这是 CW)切勿使用任何形式的
COUNT()
进行存在性检查,因为它会物理计算行数。始终将IF EXISTS
与正确的WHERE
子句一起使用,因为它将仅使用索引。systabstats
通过两个命令从内存驻留表进行更新:UPDATE STATISTICS 和“flush stats”。试试这个:对评论的回应
本节与 Sybase 无关
我已针对您的问题提供了清晰的解释问题和主题,而不解释为什么其他两个答案不正确,或者为什么你的第二个查询返回不可预测的结果。
这里是Sybase ASE 15.0 参考/构建块手册,
COUNT()
位于第 121 页。请注意,icyrock 错误引用了它,而且你们俩都误解了它,当然是不经意间。您的出发点是混乱,缺乏*
和DISTINCT
的区别,因此我试图给出一个明确的答案。我将其设为社区 Wiki,因此我对主题的回答是完整的、标准化的,因此它可以独立作为关于
COUNT()
的任何问题的完整答案。针对评论,对于那些没有听说过的人来说,SQL 是一种标准语言,由 IBM 在 1970 年代发明和发展,并在 1980 年代被接受为标准:
国际标准组织,
国际电工委员会和(书籍格式),
并由美国国家标准复制稍后研究所(感谢数字设备公司免费出版)。
没有开源或免费软件“SQL”符合该标准。它们提供了标准的一些组件(语言结构、设施、命令)。在他们提供的服务中,他们很少提供标准要求(例如 ACID 事务处理;安全性等)。
第二个区别是架构,以及 SQL 作为真正语言的实现(一致性、可从任何代码段调用)。一个没有上下文切换的真正的多线程进程无法与使用 Unix 来完成所有“多线程”和上下文切换的数百个程序相比。
因此 SAP/Sybase 和 DB2(关于标准的严格性)所做的事情,以及在较小程度上 MS(关于实现的“灵活”)所做的事情,因为 (a) 它们符合标准(不争论小的变化), (b) 他们拥有真正的服务器架构,与世界上的 MySQL 和 PostgreSQL 相差数光年。结果是硬件要求是架构服务器的 100 倍。一分钱一分货。
Oracle 有一个特殊的地位,因为它是商业的和通用的,但是它不兼容 SQL(在语言或 ACID 事务中,并且它扩展了定义)。此外,它没有架构,因此属于免费软件类别。
然后是非标准或反标准类别。
MySQL/MyISAM 以专门违反标准的方式提供
COUNT()
(这从 Lee 提供的 MySQL 手册链接;适用于非事务性应用程序。MySQL/InnoDB 和BDB 以符合标准的方式提供
COUNT()
。所有标准 SQL 供应商都提供了大量标准扩展
NOT NULL
不能立即被信任,因为该表很可能在当前表定义之前就已包含空列。定义已实施。尽管如此,Sybase 和 DB2 仍将根据标准合规性的要求对物理抛出进行计数。您可以用一系列计数来证明:SELECT COUNT(column_1) from table_name
,然后比较计数。第二个查询会让您进一步困惑,是的,因为当创建并填充内表时,计数将是准确的。既然你带着期望创造了它,它就满足了你的期望。但这并不能证明原始表格的任何内容。
This is a Sybase ASE Answer for a Sybase ASE Question
It applies to any Standard SQL Compliant DBMS; it does not apply to others.
The answer you have chosen is not correct.
COUNT()
is an ISO/IEC/ANSI Standard SQL expression. It is required to physically count the rows, which means it will be slow on a large table. It is required to not use the internal memory resident tables or catalogue tables ("metadata"). This is a run-time value, so if the table is active, it will keep changing for every executionWhat you place within the brackets is very important.
COUNT(*)
returns the number of rows including nullsCOUNT(column)
returns the number of rows wherecolumn
is Not NullYes, the placement of
DISTINCT
is also important. This forces the use of a work table. If you need to count a column that is not Unique, then you do not have a choice; but for unique columns, there is no need to useDISTINCT
DISTINCT
applies to a column or an expression derived from a column;COUNT (DISTINCT *)
is meaningless ("distinct all columns"), and ASE 15 has a substantially improved parser, which catches such things (previous version woul dhave returned a less accurate error message).
the rows actually read will depend on your ISOLATION LEVEL (the correct count will be returned for the level specified) and the current activity on the database
the cleanest method, that avoids the weird results you are getting, is to use
COUNT(PK_column)
(Since this is a CW) Never use any form of
COUNT()
for an existence check, as it physically counts the rows. Always useIF EXISTS
with the correctWHERE
clause, because it will use the index only.systabstats
, which has a count of rows in each table. This returns instantaneously, regardless of table size. The currency of those values depends on how your server is configured for performance, flushing, checkpointing, etc.systabstats
is updated from the memory-resident tables by two commands: UPDATE STATISTICS and "flush stats". Try this:Response to Comments
This Section is Not Relevant to Sybase
I have provided a clean explanation re your problem, and the subject, without explaining why the other two answers are incorrect, or why your second query returns unpredictable results.
Here is the Sybase ASE 15.0 Reference/Building Blocks Manual,
COUNT()
is on page 121. Note that icyrock has misquoted it, and both of you have mis-interpreted it, inadvertently of course. Your starting point was confusion, lack of distinction re*
andDISTINCT
, hence my attempt at a clear answer.I made this a Community Wiki, therefore my answer is complete re the subject, normalised, so that it can stand alone as a complete answer to any question re
COUNT()
.In response to comments, for those people who have not heard, SQL is a Standard language, invented and progressed by IBM in the 1970's and accepted as a Standard in the 1980's by:
International Standards Organisation,
International Electrotechnical Commission and (Book Format),
and copied by American National Standards Institute (free publication thanks to Digital Equipment Corp) somewhat later.
None of the Open Source or freeware "SQL" comply with the Standard. They provide some components (language structures, facilities, commands) of the Standard. And of those they provide, they seldom provide the Standard Requirement (eg. ACID Transaction handling; security; etc).
The second difference is Architecture, and in the implementation of SQL as a genuine language (consistency, callable from any code segment). A single genuine multi-threaded process with no context switching cannot be compared with a herd of hundreds of programs using Unix to do all its "multi-threading" and context switching.
Therefore what SAP/Sybase and DB2 (rigid regarding Standards) do, and to a lesser extent what MS ("flexible" about implementations) do, because (a) they are Standard-Compliant (without arguing the small variations), and (b) they have a genuine Server Architecture, is light years away from the MySQLs and the PostgreSQLs of the world. The result is the hardware requirement that is 100 times that of an architected Server. You get what you pay for.
Oracle has a special place because it is commercial and common, however it is not SQL compliant (in language or in ACID Transactions, and it stretches the definitions). Further, it has no architecture, which places it in the freeware category.
Then there is non-Standard or anti-Standard category.
MySQL/MyISAM provides
COUNT()
in a manner that is specifically against the Standard (this is plainly evident from the MySQL Manual link provided by Lee; good for non-transactional apps).MySQL/InnoDB & BDB provide
COUNT()
in the Standard-compliant manner.PostgreSQL is the worst, despite the heavy marketing. In addition to not being SQL compliant, it has no concept of ACID Transactions, and the "SQL" is not implemented as a language (Rule 4 in Codd's Twelve Rules). Unfortunately with each major release, as it implements some increment of SQL, you have to re-write your code.
All Standard-SQL vendors provide a large array of Extensions to the Standard
The
NOT NULL
in the table definition cannot be immediately trusted, because the table may well have had null columns in it before the current definition was implemented. Nevertheless, Sybase and DB2 will count throws physical, as per requirement for Standard compliance. You can prove with with a series of counts:SELECT COUNT(column_1) from table_name
, and then compare the counts.The second query will further confuse you, yes, because when the inner table is created, and populated, the count will be accurate. Since you created it with an expectation, it fulfilled your expectation. But that does not prove anything about that original table.
如果我没有记错的话,据此判断:
您应该使用
select count(distinct *)
。我希望select unique count(*)
始终返回 1,因为它表示“给我不同的行,每行都是一个count(*)
”,因此总会有一行,而select count(distinct *)
会为您提供不同行的计数。FWIW,上面似乎适用于 v12.5(尽管我没有看到任何差异),这里是 15.0 文档:
它明确说明了以下内容:
您可以使用
select count(distinct column_1)
左右,但不能使用select count(distinct *)
。If I'm not mistaken, judging from this:
you should use
select count(distinct *)
. I'd expectselect distinct count(*)
to always return 1, as it says "give me the distinct rows, each of which is acount(*)
", so there's always going to be one row, whileselect count(distinct *)
gives you a count of distinct rows.FWIW, the above seems to be for v12.5 (though I don't see any differences), here are the 15.0 docs:
It explicitly says the following:
You can use
select count(distinct column_1)
or so, but notselect count(distinct *)
.