更快地计算每月库存在巨大桌子上的速度

发布于 2025-02-13 07:15:55 字数 2738 浏览 0 评论 0 原文

我有一张桌子上有成千上万的行。每个与列出的商品以及将物品放在市场上以及出售时（如果已出售的话）有关的

每个物品（如果已出售）。给定的一个月有其他标准（其他标准有足够的排列，预先计算这些数字并不是理想的。）我正在尝试检索120个月的数据，并在网页上平均数据，因此理想情况下，整个查询都会如此快速可能的。我有一个可行的查询...但是只需要5秒钟的时间才能持续12个月的数据，我需要它的速度至少比这快10倍，因此我可以在不错的时间内显示10年的平均值。

这是我现在正在使用的示例：

SELECT Avg(inv)
FROM   (
    SELECT
           CASE
                  WHEN Count(*) =0 THEN NULL
                  ELSE Count(*)
           END AS inv
    FROM   listings
    WHERE  originalentrytimestamp <= @DateOfMonthEnd
    AND    @DateOfMonthEnd < @currentMonthStart
    AND    (
                  offmarketdate > @DateOfMonthEnd
           OR     offmarketdate IS NULL)
    AND    city='New York'
    AND    listprice >= 0
    AND    listprice <= 999999999
    AND    category1='RESI'
    AND    category2 IN('D','A','S','R','O')
    UNION ALL
    SELECT
           CASE
                  WHEN Count(*) =0 THEN NULL
                  ELSE Count(*)
           END AS inv
    FROM   listings
    WHERE  originalentrytimestamp <= Dateadd(month, 1,@DateOfMonthEnd)
    AND    Dateadd(month, 1,@DateOfMonthEnd) < @currentMonthStart
    AND    (
                  offmarketdate > Dateadd(month, 1,@DateOfMonthEnd)
           OR     offmarketdate IS NULL)
    AND    city='New York'
    AND    listprice >= 0
    AND    listprice <= 999999999
    AND    category1='RESI'
    AND    category2 IN('D','A','S','R','O')
    UNION ALL
    SELECT
           CASE
                  WHEN Count(*) =0 THEN NULL
                  ELSE Count(*)
           END AS inv
    FROM   listings
    WHERE  originalentrytimestamp <= Dateadd(month, 2,@DateOfMonthEnd)
    AND    Dateadd(month, 2,@DateOfMonthEnd) < @currentMonthStart
    AND    (
                  offmarketdate > Dateadd(month, 2,@DateOfMonthEnd)
           OR     offmarketdate IS NULL)
    AND    city='New York'
    AND    listprice >= 0
    AND    listprice <= 999999999
    AND    category1='RESI'
    AND    category2 IN('D','A','S','R','O')
     )

我只在此代码中包括3个月，但是如果我想运行120个月，这很快就会变得笨拙。它也很慢，但是我不确定如何将其重新使用以使用窗口函数或子句的组作为单个列表，如果花了一段时间的出售，则可能会在一月和2月计算单个列表。

案例声明是允许代码也用于计算年度最早的平均值，其中代码可能包含一个没有数据的未来一个月，并且应将其排除在平均水平之外（但是，如果没有必要可以以接受开始和结束月的格式重新编码整个内容，而不是对120个查询进行硬编码。）

从评论讨论：

执行计划： https://brentozar.com/pastetheplan/?id=rjtynali5=rjtynali5
= rjtynali5 href =“ https://pastebin.com/uxrtsdax” rel =“ nofollow noreferrer”> https://pastebin.com/uxrtsdax

此表是每天制作的实时数据的副本，而不是在中午写成，可以免费添加或删除任何索引，以加快加速查询（但是，城市价格，名单等条款的示例只是示例，我们到目前为止有索引，因为我们可以在那里搜索不同的条款。）

原文

I've got a table with hundreds of thousands of rows. Each one relating to an item listed to be sold, along with when the item was placed on the market and when it was sold, (if it has been sold yet.)

I'm looking to calculate the number of items that were active in a given month with other given criteria, (the other criteria has enough permutations that pre-calculating these numbers would not be ideal.) I'm trying to retrieve 120 months worth of data and average it on a webpage so ideally the entire query would be as fast as possible. I've got a query that works... but it takes 5 seconds to run for only 12 months of data and I need it to be at least 10x faster than that so I can display a 10 year average in a decent time.

Here is an example of what I'm using now:

SELECT Avg(inv)
FROM   (
    SELECT
           CASE
                  WHEN Count(*) =0 THEN NULL
                  ELSE Count(*)
           END AS inv
    FROM   listings
    WHERE  originalentrytimestamp <= @DateOfMonthEnd
    AND    @DateOfMonthEnd < @currentMonthStart
    AND    (
                  offmarketdate > @DateOfMonthEnd
           OR     offmarketdate IS NULL)
    AND    city='New York'
    AND    listprice >= 0
    AND    listprice <= 999999999
    AND    category1='RESI'
    AND    category2 IN('D','A','S','R','O')
    UNION ALL
    SELECT
           CASE
                  WHEN Count(*) =0 THEN NULL
                  ELSE Count(*)
           END AS inv
    FROM   listings
    WHERE  originalentrytimestamp <= Dateadd(month, 1,@DateOfMonthEnd)
    AND    Dateadd(month, 1,@DateOfMonthEnd) < @currentMonthStart
    AND    (
                  offmarketdate > Dateadd(month, 1,@DateOfMonthEnd)
           OR     offmarketdate IS NULL)
    AND    city='New York'
    AND    listprice >= 0
    AND    listprice <= 999999999
    AND    category1='RESI'
    AND    category2 IN('D','A','S','R','O')
    UNION ALL
    SELECT
           CASE
                  WHEN Count(*) =0 THEN NULL
                  ELSE Count(*)
           END AS inv
    FROM   listings
    WHERE  originalentrytimestamp <= Dateadd(month, 2,@DateOfMonthEnd)
    AND    Dateadd(month, 2,@DateOfMonthEnd) < @currentMonthStart
    AND    (
                  offmarketdate > Dateadd(month, 2,@DateOfMonthEnd)
           OR     offmarketdate IS NULL)
    AND    city='New York'
    AND    listprice >= 0
    AND    listprice <= 999999999
    AND    category1='RESI'
    AND    category2 IN('D','A','S','R','O')
     )

I've only included 3 months in this code for brevity, but this quickly gets unwieldly if I want to run 120 months. It's also slow, but I'm not sure how to re-word it to use a window function or group by clause as a single listing might be counted in both January and February if it took a while to sell.

The case statement is to allow the code to be used to calculate year-to-date averages as well where the code might include a future month with no data and that should be excluded from the average, (but it wouldn't be necessary if the entire thing can be re-coded in a format that accepts a start and end month rather than hardcoding 120 queries.)

edit from comments discussion:

execution plan: https://brentozar.com/pastetheplan/?id=rJtyNali5
table structure and indexes: https://pastebin.com/UxRTSDAx

This table is a copy of the live data made daily and not written to mid-day, any indexes can be added or removed freely to speed up the query, (but the example where clauses for city, list price, etc are only examples, we have the indexes we have so far because we might search on different clauses there.)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

芯好空 2025-02-20 07:15:55

尝试a 覆盖索引 on （City，category1，category2，ointernEntryTimestamp）包括 offmarketdate 和 ListPrice 列。像这样的DDL可能会创建您需要的索引。

CREATE NONCLUSTERED INDEX [CityCategoriesTimestamp] ON [dbo].[listings]
(
    [city] ASC,
    [category1] ASC,
    [category2] ASC,
    [originalentrytimestamp] ASC
)
INCLUDE (
    [offmarketdate] ASC,
    [listprice] ASC
);

这是因为可以从第一个合名的顺序进行扫描行到最后一个。这比扫描整个桌子要快。

并且，考虑重新加工查询以使用 eomonth（）。摆脱大联合所有级联，而不是（OriginalTerryTimestamp）进行组。我建议重写，但我不确定我足够了解您的业务逻辑。

Try a covering index on (city, category1, category2, originalentrytimestamp) including the offmarketdate and listprice columns. DDL like this might create the index you need.

CREATE NONCLUSTERED INDEX [CityCategoriesTimestamp] ON [dbo].[listings]
(
    [city] ASC,
    [category1] ASC,
    [category2] ASC,
    [originalentrytimestamp] ASC
)
INCLUDE (
    [offmarketdate] ASC,
    [listprice] ASC
);

This works because the index can be scanned sequentially from the first eligible row to the last. That's faster than scanning the whole table.

And, consider reworking your query to use EOMONTH(). Get rid of the big UNION ALL cascade, instead doing GROUP BY(originalentrytimestamp). I'd suggest a rewrite but I'm not sure I understand your business logic well enough to do that.