如何在红移中使用窗口功能?

发布于 2025-02-03 01:21:24 字数 955 浏览 3 评论 0原文

我有2张桌子: |产品| |:-----:| | product_id | | source_id |

source
source_id
优先级

有时有时有1个product_id可以包含很少的来源,而我的任务是选择具有最低优先级的数据 | product_id | source_id |优先| |:----:|:-------:| :-----:| | 10 | 2 | 9 | | 10 | 4 | 2 | | 20 | 2 | 9 | | 20 | 4 | 2 | | 30 | 2 | 9 | | 30 | 4 | 2 |

正确的结果应该是: | product_id | source_id |优先| |:----:|:-------:| :-----:| | 10 | 4 | 2 | | 20 | 4 | 2 | | 30 | 4 | 2 |

我正在使用查询:

SELECT p.product_id, p.source_id, s.priority FROM Product p
INNER JOIN Source s on s.source_id = p.source_id
WHERE s.priority = (SELECT Min(s1.priority) OVER (PARTITION BY p.product_id) FROM Source s1)

但是它返回错误“这种类型的相关子查询模式尚未支持” ,所以据我了解,我无法在RedShift中使用这种变体,应该如何解决,在那里还有其他方法吗?

I have 2 tables:
| Product |
|:----: |
| product_id |
| source_id|

Source
source_id
priority

sometimes there are cases when 1 product_id can contain few sources and my task is to select data with min priority from for example
| product_id | source_id| priority|
|:----: |:------:| :-----:|
| 10| 2| 9|
| 10| 4| 2|
| 20| 2| 9|
| 20| 4| 2|
| 30| 2| 9|
| 30| 4| 2|

correct result should be like:
| product_id | source_id| priority|
|:----: |:------:| :-----:|
| 10| 4| 2|
| 20| 4| 2|
| 30| 4| 2|

I am using query:

SELECT p.product_id, p.source_id, s.priority FROM Product p
INNER JOIN Source s on s.source_id = p.source_id
WHERE s.priority = (SELECT Min(s1.priority) OVER (PARTITION BY p.product_id) FROM Source s1)

but it returns error "this type of correlated subquery pattern is not supported yet" so as i understand i can't use such variant in Redshift, how should it be solved, are there any other ways?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

画▽骨i 2025-02-10 01:21:24

您只需要将Where子句展开到第二个数据源中,而最低优先级最简单的标志就是使用row_number()窗口函数。您是要RedShift重新运行测试中的每个JON的窗口函数,这会在集群数据库中产生许多效率低下。尝试以下(未经测试):

SELECT p.product_id, p.source_id, s.priority 
FROM Product p
INNER JOIN (
    SELECT ROW_NUMBER() OVER (PARTITION BY p.product_id, order by s1.priority) as row_num,
        source_id,
        priority
    FROM Source) s 
on s.source_id = p.source_id
WHERE row_num = 1

现在,窗口函数仅运行一次。如果可以提高完整情况的可读性,则也可以将子查询移至CTE。

You just need to unroll the where clause into the second data source and the easiest flag for min priority is to use the ROW_NUMBER() window function. You're asking Redshift to rerun the window function for each JOIN ON test which creates a lot of inefficiencies in clustered database. Try the following (untested):

SELECT p.product_id, p.source_id, s.priority 
FROM Product p
INNER JOIN (
    SELECT ROW_NUMBER() OVER (PARTITION BY p.product_id, order by s1.priority) as row_num,
        source_id,
        priority
    FROM Source) s 
on s.source_id = p.source_id
WHERE row_num = 1

Now the window function only runs once. You can also move the subquery to a CTE if that improve readability for your full case.

一生独一 2025-02-10 01:21:24

已经为该情况找到了最佳解决方案:

SELECT
  p.product_id
, p.source_id
, s.priority
, Min(s.priority) OVER (PARTITION BY p.product_id) as min_priority
FROM Product p
    INNER JOIN Source s
            ON s.source_id = p.source_id
WHERE s.priority = p.min_priority

Already found best solution for that case:

SELECT
  p.product_id
, p.source_id
, s.priority
, Min(s.priority) OVER (PARTITION BY p.product_id) as min_priority
FROM Product p
    INNER JOIN Source s
            ON s.source_id = p.source_id
WHERE s.priority = p.min_priority
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文