获取给定日期之前的最大日期的有效方法

发布于 2024-11-09 00:24:06 字数 878 浏览 1 评论 0原文

假设我有一个名为 Transaction 的表和另一个名为 Price 的表。 Price 保存给定基金在不同日期的价格。每只基金都会在不同的日期添加价格,但不会在所有可能的日期都有价格。因此,对于基金 XYZ I 可能有 5 月 1 日、5 月 7 日和 5 月 13 日的价格,而基金 ABC 可能有 5 月 3 日、5 月 9 日和 5 月 11 日的价格。

所以现在我正在研究基金在交易之日的现行价格。该交易是针对 XYZ 基金于 5 月 10 日进行的。我想要的是当天最新的已知价格,即 5 月 7 日的价格。

这是代码:

select d.TransactionID, d.FundCode, d.TransactionDate, v.OfferPrice
from Transaction d
    inner join Price v
        on v.FundCode = d.FundCode
        and v.PriceDate = (
            select max(PriceDate)
            from Price
            where FundCode = v.FundCode
            /* */ and PriceDate < d.TransactionDate 
        )

它可以工作,但速度非常慢(在实际使用中需要几分钟)。如果我删除带有前导注释的行,查询速度非常快(2 秒左右),但它随后使用每个基金的最新价格,这是错误的。

不好的部分是,与我们使用的其他一些表相比,价格表很小,而且我不清楚为什么它这么慢。我怀疑有问题的行强制 SQL Server 处理笛卡尔积,但我不知道如何避免它。

我一直希望找到一种更有效的方法来做到这一点,但到目前为止我还没有想到。有什么想法吗?

Suppose I have a table called Transaction and another table called Price. Price holds the prices for given funds at different dates. Each fund will have prices added at various dates, but they won't have prices at all possible dates. So for fund XYZ I may have prices for the 1 May, 7 May and 13 May and fund ABC may have prices at 3 May, 9 May and 11 May.

So now I'm looking at the price that was prevailing for a fund at the date of a transaction. The transaction was for fund XYZ on 10 May. What I want, is the latest known price on that day, which will be the price for 7 May.

Here's the code:

select d.TransactionID, d.FundCode, d.TransactionDate, v.OfferPrice
from Transaction d
    inner join Price v
        on v.FundCode = d.FundCode
        and v.PriceDate = (
            select max(PriceDate)
            from Price
            where FundCode = v.FundCode
            /* */ and PriceDate < d.TransactionDate 
        )

It works, but it is very slow (several minutes in real world use). If I remove the line with the leading comment, the query is very quick (2 seconds or so) but it then uses the latest price per fund, which is wrong.

The bad part is that the price table is minuscule compared to some of the other tables we use, and it isn't clear to me why it is so slow. I suspect the offending line forces SQL Server to process a Cartesian product, but I don't know how to avoid it.

I keep hoping to find a more efficient way to do this, but it has so far escaped me. Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

分開簡單 2024-11-16 00:24:06

您没有指定您正在使用的 SQL Server 版本,但如果您使用的是支持排名函数和 CTE 查询的版本,我想您会发现这比在联接中使用相关子查询的性能要高得多陈述。

它的性能应该与 Andriy 的查询非常相似。根据表的确切索引拓扑,一种方法可能比另一种方法稍快。

我倾向于喜欢基于 CTE 的方法,因为生成的代码更具可读性(在我看来)。希望这有帮助!

;WITH set_gen (TransactionID, OfferPrice, Match_val)
AS
(
    SELECT d.TransactionID, v.OfferPrice, ROW_NUMBER() OVER(PARTITION BY d.TransactionID ORDER BY v.PriceDate ASC) AS Match_val
    FROM Transaction d
        INNER JOIN Price v
            ON v.FundCode = d.FundCode
    WHERE v.PriceDate <= d.TransactionDate
)
SELECT sg.TransactionID, d.FundCode, d.TransactionDate, sg.OfferPrice
FROM Transaction d
    INNER JOIN set_gen sg ON d.TransactionID = sg.TransactionID
WHERE sg.Match_val = 1

You don't specify the version of SQL Server you're using, but if you are using a version with support for ranking functions and CTE queries I think you'll find this quite a bit more performant than using a correlated subquery within your join statement.

It should be very similar in performance to Andriy's queries. Depending on the exact index topography of your tables, one approach might be slightly faster than another.

I tend to like CTE-based approaches because the resulting code is quite a bit more readable (in my opinion). Hope this helps!

;WITH set_gen (TransactionID, OfferPrice, Match_val)
AS
(
    SELECT d.TransactionID, v.OfferPrice, ROW_NUMBER() OVER(PARTITION BY d.TransactionID ORDER BY v.PriceDate ASC) AS Match_val
    FROM Transaction d
        INNER JOIN Price v
            ON v.FundCode = d.FundCode
    WHERE v.PriceDate <= d.TransactionDate
)
SELECT sg.TransactionID, d.FundCode, d.TransactionDate, sg.OfferPrice
FROM Transaction d
    INNER JOIN set_gen sg ON d.TransactionID = sg.TransactionID
WHERE sg.Match_val = 1
原来是傀儡 2024-11-16 00:24:06

有一种方法可以查找具有最大值或最小值的行,该方法涉及到自身的 LEFT JOIN,而不是更直观但可能成本更高的 INNER JOIN 到自派生的聚合列表。

基本上,该方法使用以下模式:

SELECT t.*
FROM t
  LEFT JOIN t AS t2 ON t.key = t2.key
    AND t2.Value > t.Value  /* ">" is when getting maximums; "<" is for minimums */
WHERE t2.key IS NULL

或其 NOT EXISTS 对应项:

SELECT *
FROM t
WHERE NOT EXISTS (
  SELECT *
  FROM t AS t2
  WHERE t.key = t2.key
    AND t2.Value > t.Value  /* same as above applies to ">" here as well */
)

因此,结果是不存在具有相同键且值大于给定值的行的所有行。

当只有一张表时,上述方法的应用非常简单。但是,当存在另一个表时,如何应用它可能并不那么明显,特别是当像您的情况一样,另一个表不仅因为它在那里,而且还为我们提供了额外的过滤而使实际查询变得更加复杂。对于我们正在寻找的值,即日期的上限。

因此,应用该方法的 LEFT JOIN 版本时,结果查询可能如下所示:

SELECT
  d.TransactionID,
  d.FundCode,
  d.TransactionDate,
  v.OfferPrice
FROM Transaction d
  INNER JOIN Price v ON v.FundCode = d.FundCode
  LEFT JOIN Price v2 ON v2.FundCode = v.FundCode  /* this and */
    AND v2.PriceDate > v.PriceDate                /* this are where we are applying
                                                       the above method; */
    AND v2.PriceDate < d.TransactionDate          /* and this is where we are limiting
                                                       the maximum value */
WHERE v2.FundCode IS NULL

这是一个使用 NOT EXISTS 的类似解决方案:

SELECT
  d.TransactionID,
  d.FundCode,
  d.TransactionDate,
  v.OfferPrice
FROM Transaction d
  INNER JOIN Price v ON v.FundCode = d.FundCode
  WHERE NOT EXISTS (
    SELECT *
    FROM Price v2
    WHERE v2.FundCode = v.FundCode           /* this and */
      AND v2.PriceDate > v.PriceDate         /* this are where we are applying
                                                the above method; */
      AND v2.PriceDate < d.TransactionDate   /* and this is where we are limiting
                                                the maximum value */
  )

There's a method for finding rows with maximum or minimum values, which involves LEFT JOIN to self, rather than more intuitive, but probably more costly as well, INNER JOIN to a self-derived aggregated list.

Basically, the method uses this pattern:

SELECT t.*
FROM t
  LEFT JOIN t AS t2 ON t.key = t2.key
    AND t2.Value > t.Value  /* ">" is when getting maximums; "<" is for minimums */
WHERE t2.key IS NULL

or its NOT EXISTS counterpart:

SELECT *
FROM t
WHERE NOT EXISTS (
  SELECT *
  FROM t AS t2
  WHERE t.key = t2.key
    AND t2.Value > t.Value  /* same as above applies to ">" here as well */
)

So, the result is all the rows for which there doesn't exist a row with the same key and the value greater than the given.

When there's just one table, application of the above method is pretty straightforward. However, it may not be that obvious how to apply it when there's another table, especially when, like in your case, the other table makes the actual query more complex not merely by its being there, but also by providing us with an additional filtering for the values we are looking for, namely with the upper limits for the dates.

So, here's what the resulting query might look like when applying the LEFT JOIN version of the method:

SELECT
  d.TransactionID,
  d.FundCode,
  d.TransactionDate,
  v.OfferPrice
FROM Transaction d
  INNER JOIN Price v ON v.FundCode = d.FundCode
  LEFT JOIN Price v2 ON v2.FundCode = v.FundCode  /* this and */
    AND v2.PriceDate > v.PriceDate                /* this are where we are applying
                                                       the above method; */
    AND v2.PriceDate < d.TransactionDate          /* and this is where we are limiting
                                                       the maximum value */
WHERE v2.FundCode IS NULL

And here's a similar solution with NOT EXISTS:

SELECT
  d.TransactionID,
  d.FundCode,
  d.TransactionDate,
  v.OfferPrice
FROM Transaction d
  INNER JOIN Price v ON v.FundCode = d.FundCode
  WHERE NOT EXISTS (
    SELECT *
    FROM Price v2
    WHERE v2.FundCode = v.FundCode           /* this and */
      AND v2.PriceDate > v.PriceDate         /* this are where we are applying
                                                the above method; */
      AND v2.PriceDate < d.TransactionDate   /* and this is where we are limiting
                                                the maximum value */
  )
无法言说的痛 2024-11-16 00:24:06

pricedatetransactiondate 是否均已编入索引?如果不是,您正在进行表扫描,这可能是性能瓶颈的原因。

Are both pricedate and transactiondate indexed? If not you are doing table scans which is likely the cause of the performance bottleneck.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文