对大量数据选择 Count(*)

发布于 2024-08-14 06:41:47 字数 738 浏览 16 评论 0原文

我想对报告执行此操作，但我的表中有 20,000,000 条记录，这会导致我的应用程序超时。

SELECT
        T.transactionStatusID,
        TS.shortName AS TransactionStatusDefShortName,
        count(*) AS qtyTransactions
    FROM
        Transactions T 

    INNER JOIN TransactionTypesCurrencies TTC
                ON  T.id_Ent = TTC.id_Ent
                    AND T.trnTypeCurrencyID = TTC.trnTypeCurrencyID
            INNER JOIN TransactionStatusDef TS
                ON  T.id_Ent = TS.ent_Ent
                AND T.transactionStatusID = TS.ID
WHERE
    T.id_Ent = @id_Ent
GROUP BY
        T.transactionStatusID,
        TS.shortName

据我所知 COUNT(*) 会导致全表扫描，这使我的查询花费太多时间，我使用 MS SQL 2005，有什么帮助吗？

编辑：

项目负责人告诉我，查询只是一天，它可以提供帮助吗？

原文

I want to do this for a Report but i have 20,000,000 of records in my table and it causes an TimeOut in my application.

SELECT
        T.transactionStatusID,
        TS.shortName AS TransactionStatusDefShortName,
        count(*) AS qtyTransactions
    FROM
        Transactions T 

    INNER JOIN TransactionTypesCurrencies TTC
                ON  T.id_Ent = TTC.id_Ent
                    AND T.trnTypeCurrencyID = TTC.trnTypeCurrencyID
            INNER JOIN TransactionStatusDef TS
                ON  T.id_Ent = TS.ent_Ent
                AND T.transactionStatusID = TS.ID
WHERE
    T.id_Ent = @id_Ent
GROUP BY
        T.transactionStatusID,
        TS.shortName

as far as i know COUNT(*) causes a full table scan and it makes my query to take too much time, im Using MS SQL 2005, any help ?

Edit:

The project leader tells me that the Query is just for one day it could help?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你与清晨阳光 2024-08-21 06:41:47

据我所知，COUNT(*) 会导致全表扫描，这会使我的查询花费太多时间，我正在使用 MS SQL 2005，有什么帮助吗？

COUNT(*) 可以使用任何能够给出答案的来源，其中包括索引。

在您的情况下，我将使用 trnTypeCurrencyID 在 (id_ent, transactionStatusID) 上创建一个覆盖索引：

CREATE INDEX ON Transactions (id_ent, transactionStatusID) INCLUDE (trnTypeCurrencyID)

并稍微重写查询：

SELECT  transactionStatusID, qtyTransactions, TS.shortName
FROM    (
        SELECT  T.transactionStatusID,
                COUNT(*) AS qtyTransactions
        FROM    Transactions T
        JOIN    TransactionTypesCurrencies TTC
        ON      TTC.id_Ent = T.id_Ent
                AND TTC.trnTypeCurrencyID = T.trnTypeCurrencyID
        WHERE   T.id_Ent = @id_Ent
        GROUP BY
                T.transactionStatusID
        ) TD
JOIN    TransactionStatusDef TS
ON      TS.ent_Ent = @id_Ent
        AND TS.ID = TD.transactionStatusID

索引将在 上进行过滤id_ent 并在 transactionStatusID 上并行化。由于您已经覆盖了 trnTypeCurrencyID，引擎将不必查找表中的值，它已经存在于索引中。

GROUP BY 子句还仅包含索引中的列，因此它的并行性更好。

更新：

通过添加WITH (ONLINE = ON)，您可以在创建索引期间使表保持可运行状态：

CREATE INDEX ON Transactions (id_ent, transactionStatusID) INCLUDE (trnTypeCurrencyID) WITH (ONLINE = ON)

as far as i know COUNT(*) causes a full table scan and it makes my query to take too much time, im Using MS SQL 2005, any help ?

COUNT(*) can use any source that is able to give the answer, this includes indexes.

In your very case, I'd create a covering index on (id_ent, transactionStatusID) with trnTypeCurrencyID:

CREATE INDEX ON Transactions (id_ent, transactionStatusID) INCLUDE (trnTypeCurrencyID)

and rewrite the query a little:

SELECT  transactionStatusID, qtyTransactions, TS.shortName
FROM    (
        SELECT  T.transactionStatusID,
                COUNT(*) AS qtyTransactions
        FROM    Transactions T
        JOIN    TransactionTypesCurrencies TTC
        ON      TTC.id_Ent = T.id_Ent
                AND TTC.trnTypeCurrencyID = T.trnTypeCurrencyID
        WHERE   T.id_Ent = @id_Ent
        GROUP BY
                T.transactionStatusID
        ) TD
JOIN    TransactionStatusDef TS
ON      TS.ent_Ent = @id_Ent
        AND TS.ID = TD.transactionStatusID

The index will filter on id_ent and parallelize on transactionStatusID. Since you have trnTypeCurrencyID covered, the engine will not have to lookup the value in the table, it's already present in the index.

The GROUP BY clause also includes only the columns from the index so it parallelizes much better.

Update:

By adding WITH (ONLINE = ON) you can leave the table operational for the time the index is being created:

CREATE INDEX ON Transactions (id_ent, transactionStatusID) INCLUDE (trnTypeCurrencyID) WITH (ONLINE = ON)

回复收藏 0 原文

孤独患者 2024-08-21 06:41:47

如果您查看查询的执行计划，就会突出显示表现不佳的部分。它会告诉您是否正在执行表扫描、索引扫描或索引查找。
所以这是开始寻找的最佳位置。

您现在有索引吗？ JOIN 和 WHERE 子句中涉及的字段是主要候选者 - 如果您没有索引，这将是一个主要因素。

回复收藏 0 原文

嗳卜坏 2024-08-21 06:41:47

你尝试过

COUNT(1)

吗？

另外，是否需要加入TransactionTypesCurrency，您似乎没有使用其中的任何内容？

Have you tried

COUNT(1)

Instead?

Also, is the join to TransactionTypesCurrencies required, does not seem that you use anything from it?

回复收藏 0 原文

风透绣罗衣 2024-08-21 06:41:47

事务表上的聚集索引是什么？还存在哪些其他索引？您可以尝试此查询来消除一个连接：

SELECT
    T.TransactionStatusID,
    TS.ShortName,
    qtyTransactions = COUNT(*)
FROM
    dbo.Transactions AS T
INNER JOIN
    dbo.TransactionStatusDef AS TS
    ON T.id_Ent = TS.ent_Ent
    AND T.transactionStatusID = TS.ID
WHERE EXISTS
(
    SELECT 1
        FROM do.TransactionTypeCurrencies AS TTC
        WHERE TTC.id_Ent = T.id_Ent
        AND TTC.trnTypeCurrencyID = T.trnTypeCurrencyID
)
AND T.id_Ent = @id_Ent
GROUP BY
    T.transactionStatusID,
    TS.shortName;

您还可以尝试在发出查询之前使用快照隔离，例如，

SET TRANSACTION ISOLATION LEVEL SNAPSHOT;

要执行此操作，您必须：

ALTER DATABASE dbname SET ALLOW_SNAPSHOT_ISOLATION ON;

一般来说，尽管您希望确保索引正确。如果您无法为此查询的表应用适当的索引，因为这会损害其他查询，那么您可以考虑使用索引视图来为您维护此计数（以插入/更新性能为代价），或者如果您需要表分区在企业版上，或者偶尔在后台运行此数据的汇总，以便您的应用程序不必等待它（假设实际计数有点过时是可以的）。

What is the clustered index on the transactions table? What other indexes exist? You could try this query to eliminate one join:

SELECT
    T.TransactionStatusID,
    TS.ShortName,
    qtyTransactions = COUNT(*)
FROM
    dbo.Transactions AS T
INNER JOIN
    dbo.TransactionStatusDef AS TS
    ON T.id_Ent = TS.ent_Ent
    AND T.transactionStatusID = TS.ID
WHERE EXISTS
(
    SELECT 1
        FROM do.TransactionTypeCurrencies AS TTC
        WHERE TTC.id_Ent = T.id_Ent
        AND TTC.trnTypeCurrencyID = T.trnTypeCurrencyID
)
AND T.id_Ent = @id_Ent
GROUP BY
    T.transactionStatusID,
    TS.shortName;

You could also try to use snapshot isolation before issuing the query, e.g.

SET TRANSACTION ISOLATION LEVEL SNAPSHOT;

To do this you must have:

ALTER DATABASE dbname SET ALLOW_SNAPSHOT_ISOLATION ON;

In general though you want to make sure you get the indexing right. If you can't apply proper indexing on the tables for this query because it will hurt other queries, then you can consider an indexed view to maintain this count for you (at the cost of insert/update performance), or table partitioning if you are on Enterprise Edition, or occasionally running a rollup of this data in the background so that your application doesn't have to wait for it (assuming it is ok that the actual count is a bit stale).

回复收藏 0 原文

~没有更多了~