SELECT COUNT(*) 与表中所有记录的计数有何不同?

发布于 2024-10-06 06:19:21 字数 1390 浏览 4 评论 0 原文

所以我有一个表:

CREATE TABLE TABLE_NAME (
    COLUMN_1   char(12)    NOT NULL,
    COLUMN_2   char(2)     NOT NULL,
    COLUMN_3   char(1)     NOT NULL,
    COLUMN_4   int         NOT NULL,
    COLUMN_5   char(2)     NOT NULL,
    COLUMN_6   money       NOT NULL,
    COLUMN_7   smallint    NOT NULL,
    COLUMN_8   varchar(10) NOT NULL,
    COLUMN_9   smallint    NOT NULL,
    COLUMN_10  datetime    NOT NULL
    Primary Key (COLUMN_1, COLUMN_2, COLUMN_3)
)

SELECT COUNT(*) 返回的值与 SELECT DISTINCT COUNT(*) 不同。这怎么可能?

我还尝试了

SELECT COUNT(*) FROM (
    SELECT
        COLUMN_1,
        COLUMN_2,
        COLUMN_3,
        COLUMN_4,
        COLUMN_5,
        COLUMN_6,
        COLUMN_7,
        COLUMN_8,
        COLUMN_9,
        COLUMN_10
     FROM TABLE_NAME
    ) TMP

返回与不同查询相同的计数。

我看不出主键和所有字段都不为空的情况下,总计数可能与唯一记录的计数不同。

顺便说一句,这是在 Sybase ASE 15 上进行的。

五十万条记录中存在大约一百条记录的差异。我还在其他几个表中看到了这个问题,但只选择了一个作为示例。

编辑

为了完整起见,我应该提到,我在编写一个简单的作业以将该表完全复制到远程数据库时发现了这个问题。我的应用程序正在记录一定数量的读/写操作,但由于源数据库中的记录数与目标数据库中的记录数不同而未能通过 QA。这两个值都是通过 COUNT(*) 获得的;从目标(Oracle 10g)返回的计数与我的应用程序记录的读/写操作数相同。由于源表上的所有字段都定义为 NOT NULL 并且定义了主键,因此我无法解释我的应用程序如何丢失少量记录。

这是我开始使用上面列出的备用查询的时候,这两个查询都与我的应用程序读/写计数以及从目标返回的 COUNT(*) 值一致。换句话说,唯一不匹配的值是源数据库上的 COUNT(*)

So I have a table:

CREATE TABLE TABLE_NAME (
    COLUMN_1   char(12)    NOT NULL,
    COLUMN_2   char(2)     NOT NULL,
    COLUMN_3   char(1)     NOT NULL,
    COLUMN_4   int         NOT NULL,
    COLUMN_5   char(2)     NOT NULL,
    COLUMN_6   money       NOT NULL,
    COLUMN_7   smallint    NOT NULL,
    COLUMN_8   varchar(10) NOT NULL,
    COLUMN_9   smallint    NOT NULL,
    COLUMN_10  datetime    NOT NULL
    Primary Key (COLUMN_1, COLUMN_2, COLUMN_3)
)

SELECT COUNT(*) returns a different value than SELECT DISTINCT COUNT(*). How can this be possible?

I also tried

SELECT COUNT(*) FROM (
    SELECT
        COLUMN_1,
        COLUMN_2,
        COLUMN_3,
        COLUMN_4,
        COLUMN_5,
        COLUMN_6,
        COLUMN_7,
        COLUMN_8,
        COLUMN_9,
        COLUMN_10
     FROM TABLE_NAME
    ) TMP

which returned the same count as the distinct query.

I can't see how with a primary key and all fields being NOT NULL, there can be a different total count than the count of unique records.

BTW, this is on Sybase ASE 15.

The discrepancy is a hundred or so records out of a half million. I'm also seeing this problem in several other tables, but chose just one for the example.

Edit

I should mention for the sake of completeness that I discovered this problem when writing a simple job to completely copy this table to a remote database. My application was recording a certain number of read/write operations, but failed QA because the number of records in the source database differed from the number of records in the target database. Both values were obtained via COUNT(*); the count returned from the target (Oracle 10g) was the same as the number of read/write operations recorded by my app. As all fields on the source table are defined NOT NULL and a primary key is defined, I was at a loss to explain how my application was losing a tiny number of records.

This is when I started using the alternate queries listed above, both of which agreed with my apps read/write count, as well as the COUNT(*) value returned from the target. In other words, the only value that did not match was the COUNT(*) on the source database.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

柠栀 2024-10-13 06:19:21

在大多数支持它的数据库中,count(*) 实际上并不检索所有记录并对其进行计数,而是获取一些仅跟踪行数(或近似行数)的元数据字段目前存储在表中。另一方面,当您执行需要处理实际数据的操作时,dbms 无论如何都会获取行,并且会按照您的预期对它们进行计数。

当然,可以合理地预期,无论如何实现,count(*) 的结果都将与更复杂但等效的查询相同。那么这表明(也许)您的表的元数据已以某种方式损坏。 (我想说这是一个不错的选择——我对 sybase 不太熟悉,但大多数 dbms 有一种方法可以强制重建表指标......这可能值得一试)。

另一种可能的解释是数据库的内部表行计数器实际上并不是设计为 100% 准确的。 (第二种可能性纯粹是有根据的猜测......我实际上不知道Sybase的行计数器是否属实,但可能值得进一步调查)。

In most databases that support it, count(*) doesn't actually retrieve all records and count them -- instead it fetches some metadata field that just tracks the number of rows (or approximate number of rows) presently stored in the table. On the other hand, when you do something that requires working with actual data, the dbms is going to fetch the rows anyway, and it will count them as you would expect it to.

Of course, it's reasonable to expect that, regardless of how it's implemented, the result of count(*) would be the same as more a complex but equivalent query. That would suggest then, that (maybe) your table's metadata is corrupted somehow. (I'd say this one is a good bet -- I'm not familiar with sybase specifically, but most dbms have a way to force rebuild the table metrics... that might be worth a try here).

Another possible explanation is that the database's internal table row counter is actually not designed to be 100% accurate. (this second possibility is pure educated speculation... I don't actually know whether this is true of Sybase's row counter or not, but it might be worth further investigation).

绝不服输 2024-10-13 06:19:21

这是 Sybase ASE 对 Sybase ASE 问题的回答

它适用于任何标准 SQL 兼容的 DBMS;它不适用于其他人。

您选择的答案不正确。

  1. COUNT() 是 ISO/IEC/ANSI 标准 SQL 表达式。需要物理地计算行数,这意味着在大表上会很慢。要求使用内存驻留表或目录表(“元数据”)。这是一个运行时值,因此如果表处于活动状态,它将在每次执行时不断更改
  • 括号内放置的内容非常重要。

  • COUNT(*) 返回包含空值的行数

  • COUNT(column) 返回 column 不为 Null 的行数< /p>

  • 是的,DISTINCT 的位置也很重要。这迫使使用工作台。如果您需要计算不唯一的列,那么您别无选择;但对于唯一列,无需使用 DISTINCT

  • DISTINCT 适用于列或从列派生的表达式;
    计数(不同 *)
    毫无意义(“区分所有列”),并且 ASE 15 有一个显着改进的解析器,它可以捕获此类内容(以前的版本将返回不太准确的错误消息)。

  • 实际读取的行数取决于您的隔离级别(将为指定级别返回正确的计数)和数据库上的当前活动

  • 最干净的方法,可以避免您得到的奇怪结果,是使用
    COUNT(PK_column)

  • (因为这是 CW)切勿使用任何形式的 COUNT() 进行存在性检查,因为它会物理计算行数。始终将 IF EXISTS 与正确的 WHERE 子句一起使用,因为它将仅使用索引。

  1. 如果您需要准确的计数但不想读取所有行,可以使用读取目录表 systabstats 的函数,该表包含每个表中的行数。无论表大小如何,都会立即返回。这些值的通用性取决于服务器的性能、刷新、检查点等配置方式。systabstats 通过两个命令从内存驻留表进行更新:UPDATE STATISTICS 和“flush stats”。试试这个:
    EXEC sp_flushstats
    SELECT ROW_COUNT (DB_ID(), OBJECT_ID("table_name") )

对评论的回应

本节与 Sybase 无关

  1. 我已针对您的问题提供了清晰的解释问题和主题,而不解释为什么其他两个答案不正确,或者为什么你的第二个查询返回不可预测的结果。

  2. 这里是Sybase ASE 15.0 参考/构建块手册COUNT() 位于第 121 页。请注意,icyrock 错误引用了它,而且你们俩都误解了它,当然是不经意间。您的出发点是混乱,缺乏 *DISTINCT 的区别,因此我试图给出一个明确的答案。

  3. 我将其设为社区 Wiki,因此我对主题的回答是完整的、标准化的,因此它可以独立作为关于 COUNT() 的任何问题的完整答案。

  4. 针对评论,对于那些没有听说过的人来说,SQL 是一种标准语言,由 IBM 在 1970 年代发明和发展,并在 1980 年代被接受为标准:

  • 国际标准组织

  • 国际电工委员会(书籍格式),


  • 并由美国国家标准复制稍后研究所(感谢数字设备公司免费出版)。

  • 没有开源或免费软件“SQL”符合该标准。它们提供了标准的一些组件(语言结构、设施、命令)。在他们提供的服务中,他们很少提供标准要求(例如 ACID 事务处理;安全性等)。

  • 第二个区别是架构,以及 SQL 作为真正语言的实现(一致性、可从任何代码段调用)。一个没有上下文切换的真正的多线程进程无法与使用 Unix 来完成所有“多线程”和上下文切换的数百个程序相比。

  • 因此 SAP/Sybase 和 DB2(关于标准的严格性)所做的事情,以及在较小程度上 MS(关于实现的“灵活”)所做的事情,因为 (a) 它们符合标准(不争论小的变化), (b) 他们拥有真正的服务器架构,与世界上的 MySQL 和 PostgreSQL 相差数光年。结果是硬件要求是架构服务器的 100 倍。一分钱一分货。

  • Oracle 有一个特殊的地位,因为它是商业的和通用的,但是它不兼容 SQL(在语言或 ACID 事务中,并且它扩展了定义)。此外,它没有架构,因此属于免费软件类别。

  • 然后是非标准或反标准类别。
    MySQL/MyISAM 以专门违反标准的方式提供COUNT()(这从 Lee 提供的 MySQL 手册链接;适用于非事务性应用程序。
    MySQL/InnoDB 和BDB 以符合标准的方式提供 COUNT()

  • 尽管营销力度很大,但 PostgreSQL 是最差的。除了不符合 SQL 之外,它没有 ACID 事务的概念,并且“SQL”不是作为语言实现的(科德十二条规则中的规则 4)。不幸的是,对于每个主要版本,由于它实现了一些 SQL 增量,因此您必须重新编写代码。

  • 所有标准 SQL 供应商都提供了大量标准扩展

  1. 定义中的 NOT NULL 不能立即被信任,因为该表很可能在当前表定义之前就已包含空列。定义已实施。尽管如此,Sybase 和 DB2 仍将根据标准合规性的要求对物理抛出进行计数。您可以用一系列计数来证明:SELECT COUNT(column_1) from table_name,然后比较计数。

  2. 第二个查询会让您进一步困惑,是的,因为当创建并填充内表时,计数将是准确的。既然你带着期望创造了它,它就满足了你的期望。但这并不能证明原始表格的任何内容。

This is a Sybase ASE Answer for a Sybase ASE Question

It applies to any Standard SQL Compliant DBMS; it does not apply to others.

The answer you have chosen is not correct.

  1. COUNT() is an ISO/IEC/ANSI Standard SQL expression. It is required to physically count the rows, which means it will be slow on a large table. It is required to not use the internal memory resident tables or catalogue tables ("metadata"). This is a run-time value, so if the table is active, it will keep changing for every execution
  • What you place within the brackets is very important.

  • COUNT(*) returns the number of rows including nulls

  • COUNT(column) returns the number of rows where column is Not Null

  • Yes, the placement of DISTINCTis also important. This forces the use of a work table. If you need to count a column that is not Unique, then you do not have a choice; but for unique columns, there is no need to use DISTINCT

  • DISTINCT applies to a column or an expression derived from a column;
    COUNT (DISTINCT *)
    is meaningless ("distinct all columns"), and ASE 15 has a substantially improved parser, which catches such things (previous version woul dhave returned a less accurate error message).

  • the rows actually read will depend on your ISOLATION LEVEL (the correct count will be returned for the level specified) and the current activity on the database

  • the cleanest method, that avoids the weird results you are getting, is to use
    COUNT(PK_column)

  • (Since this is a CW) Never use any form of COUNT() for an existence check, as it physically counts the rows. Always use IF EXISTS with the correct WHERE clause, because it will use the index only.

  1. If you need an accurate count but do not want to read all the rows, there is a function to read catalogue table systabstats, which has a count of rows in each table. This returns instantaneously, regardless of table size. The currency of those values depends on how your server is configured for performance, flushing, checkpointing, etc. systabstats is updated from the memory-resident tables by two commands: UPDATE STATISTICS and "flush stats". Try this:
    EXEC sp_flushstats
    SELECT ROW_COUNT (DB_ID(), OBJECT_ID("table_name") )

Response to Comments

This Section is Not Relevant to Sybase

  1. I have provided a clean explanation re your problem, and the subject, without explaining why the other two answers are incorrect, or why your second query returns unpredictable results.

  2. Here is the Sybase ASE 15.0 Reference/Building Blocks Manual, COUNT() is on page 121. Note that icyrock has misquoted it, and both of you have mis-interpreted it, inadvertently of course. Your starting point was confusion, lack of distinction re * and DISTINCT, hence my attempt at a clear answer.

  3. I made this a Community Wiki, therefore my answer is complete re the subject, normalised, so that it can stand alone as a complete answer to any question re COUNT().

  4. In response to comments, for those people who have not heard, SQL is a Standard language, invented and progressed by IBM in the 1970's and accepted as a Standard in the 1980's by:

  • International Standards Organisation,

  • International Electrotechnical Commission and (Book Format),

  • and copied by American National Standards Institute (free publication thanks to Digital Equipment Corp) somewhat later.

  • None of the Open Source or freeware "SQL" comply with the Standard. They provide some components (language structures, facilities, commands) of the Standard. And of those they provide, they seldom provide the Standard Requirement (eg. ACID Transaction handling; security; etc).

  • The second difference is Architecture, and in the implementation of SQL as a genuine language (consistency, callable from any code segment). A single genuine multi-threaded process with no context switching cannot be compared with a herd of hundreds of programs using Unix to do all its "multi-threading" and context switching.

  • Therefore what SAP/Sybase and DB2 (rigid regarding Standards) do, and to a lesser extent what MS ("flexible" about implementations) do, because (a) they are Standard-Compliant (without arguing the small variations), and (b) they have a genuine Server Architecture, is light years away from the MySQLs and the PostgreSQLs of the world. The result is the hardware requirement that is 100 times that of an architected Server. You get what you pay for.

  • Oracle has a special place because it is commercial and common, however it is not SQL compliant (in language or in ACID Transactions, and it stretches the definitions). Further, it has no architecture, which places it in the freeware category.

  • Then there is non-Standard or anti-Standard category.
    MySQL/MyISAM provides COUNT() in a manner that is specifically against the Standard (this is plainly evident from the MySQL Manual link provided by Lee; good for non-transactional apps).
    MySQL/InnoDB & BDB provide COUNT() in the Standard-compliant manner.

  • PostgreSQL is the worst, despite the heavy marketing. In addition to not being SQL compliant, it has no concept of ACID Transactions, and the "SQL" is not implemented as a language (Rule 4 in Codd's Twelve Rules). Unfortunately with each major release, as it implements some increment of SQL, you have to re-write your code.

  • All Standard-SQL vendors provide a large array of Extensions to the Standard

  1. The NOT NULL in the table definition cannot be immediately trusted, because the table may well have had null columns in it before the current definition was implemented. Nevertheless, Sybase and DB2 will count throws physical, as per requirement for Standard compliance. You can prove with with a series of counts:SELECT COUNT(column_1) from table_name, and then compare the counts.

  2. The second query will further confuse you, yes, because when the inner table is created, and populated, the count will be accurate. Since you created it with an expectation, it fulfilled your expectation. But that does not prove anything about that original table.

岁月如刀 2024-10-13 06:19:21

如果我没有记错的话,据此判断:

您应该使用select count(distinct *)。我希望 select unique count(*) 始终返回 1,因为它表示“给我不同的行,每行都是一个 count(*)”,因此总会有一行,而 select count(distinct *) 会为您提供不同行的计数。

FWIW,上面似乎适用于 v12.5(尽管我没有看到任何差异),这里是 15.0 文档:

它明确说明了以下内容:

count(*) 查找行数。 count(*) 不接受任何参数,并且不能与distinct 一起使用。无论是否存在空值,所有行都会被计数。

您可以使用 select count(distinct column_1) 左右,但不能使用 select count(distinct *)

If I'm not mistaken, judging from this:

you should use select count(distinct *). I'd expect select distinct count(*) to always return 1, as it says "give me the distinct rows, each of which is a count(*)", so there's always going to be one row, while select count(distinct *) gives you a count of distinct rows.

FWIW, the above seems to be for v12.5 (though I don't see any differences), here are the 15.0 docs:

It explicitly says the following:

count(*) finds the number of rows. count(*) does not take any arguments, and cannot be used with distinct. All rows are counted, regardless of the presence of null values.

You can use select count(distinct column_1) or so, but not select count(distinct *).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文