如何在雪花中的两个日期之间动态生成日期?

发布于 2025-02-11 19:46:57 字数 255 浏览 1 评论 0 原文

我一直在搜索雪花中的一个很好的generate_series类似物,但是到目前为止,我发现的范围有些限制。我见过的大多数示例都使用 rowcount ,但是我需要比这更动态的东西。

我有这些列: Location_ID,subscription_id,start_date,end_date

日期列的日期通常是一年,但是在很多情况下,我需要考虑这一点。

如何在开始日期和结束日期之间产生无间隙的日期范围?

谢谢你!

I've been searching for a good generate_series analog in Snowflake but what I've found so far is a bit limiting in scope. Most of the examples I've seen use rowcount but I need something more dynamic than that.

I have these columns:
location_id, subscription_id, start_date, end_date

The datediff of the date columns is usually a year but there are many instances where it isn't so I need to account for that.

How do I generate a gapless date range between my start and end dates?

Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

水晶透心 2025-02-18 19:46:57

有几种方法可以解决此问题,但这是我使用 sql Generator function datespine_groups

我之所以喜欢这样做,是因为它的灵活性足够灵活,我可以在日期之间添加每周,每小时或每月的间隔并重复使用代码。

参数组界限更改连接以微妙的方式发生的方式,使您可以控制日期如何被过滤出来:

global

local - 每个location_id,subscription_id subscription_id都有自己的启动/结束日期,该日期基于日期列中的第一个和最后一个值

混合混合 - 每个位置_id,subscription_id都有自己开始/结束日期,但是它们都共享相同的结束日期,

而不是尝试使其在1查询中完美,我认为使用混合进行生成可能更容易,然后过滤> group_start_date 发生在原始数据的 end_date 之后。

这是SQL。一开始您可以(1)找到一种动态生成3个参数的方法,或(2)硬码一个荒谬的范围,可以持续您的职业生涯,并让其余的查询过滤掉:)

您可以更改到另一个 datePart ,我只认为您正在寻找每月。

WITH GLOBAL_SPINE AS (
  SELECT 
    ROW_NUMBER() OVER (
      ORDER BY 
        NULL
    ) as INTERVAL_ID, 
    DATEADD(
      'month', 
      (INTERVAL_ID - 1), 
      '2018-01-01T00:00' :: timestamp_ntz
    ) as SPINE_START, 
    DATEADD(
      'month', INTERVAL_ID, '2018-01-01T00:00' :: timestamp_ntz
    ) as SPINE_END 
  FROM 
    TABLE (
      GENERATOR(ROWCOUNT => 2192)
    )
), 
GROUPS AS (
  SELECT 
    location_id, 
    subscription_id, 
    MIN(start_date) AS LOCAL_START, 
    MAX(start_date) AS LOCAL_END 
  FROM 
    My_First_Table 
  GROUP BY 
    location_id, 
    subscription_id
), 
GROUP_SPINE AS (
  SELECT 
    location_id, 
    subscription_id, 
    SPINE_START AS GROUP_START, 
    SPINE_END AS GROUP_END 
  FROM 
    GROUPS G CROSS 
    JOIN LATERAL (
      SELECT 
        SPINE_START, 
        SPINE_END 
      FROM 
        GLOBAL_SPINE S 
      WHERE 
        S.SPINE_START >= G.LOCAL_START
    )
) 
SELECT 
  G.location_id AS GROUP_BY_location_id, 
  G.subscription_id AS GROUP_BY_subscription_id, 
  GROUP_START, 
  GROUP_END, 
  T.* 
FROM 
  GROUP_SPINE G 
  LEFT JOIN My_First_Table T ON start_date >= G.GROUP_START 
  AND start_date < G.GROUP_END 
  AND G.location_id = T.location_id 
  AND G.subscription_id = T.subscription_id

There are several ways to approach this, but here's the way I do it with SQL Generator function Datespine_Groups.

The reason I like to do it this way, is because its flexible enough that I can add weekly, hourly, or monthly intervals between the dates and reuse the code.

The parameter group bounds changes the way the join happens in a subtle way that allows you to control how the dates get filtered out:

global - every location_id, subscription_id combination will start on the same start_date

local - every location_id, subscription_id has their own start/end dates based on the first and last values in the date column

mixed - every location_id, subscription_id has their own start/end dates, but they all share the same end date

Rather than try and make it perfect in 1 query, I think it's probably easier to generate it with mixed and then filter out where the group_start_date occurs after the end_date of your original data.

Here's the SQL. At the very beginning you can either (1) find a way to dynamically generate the 3 parameters, or (2) hard code a ridiculous range that'll last your career and let the rest of the query filter them out :)

You can change month to another datepart, I only assumed you were looking for monthly.

WITH GLOBAL_SPINE AS (
  SELECT 
    ROW_NUMBER() OVER (
      ORDER BY 
        NULL
    ) as INTERVAL_ID, 
    DATEADD(
      'month', 
      (INTERVAL_ID - 1), 
      '2018-01-01T00:00' :: timestamp_ntz
    ) as SPINE_START, 
    DATEADD(
      'month', INTERVAL_ID, '2018-01-01T00:00' :: timestamp_ntz
    ) as SPINE_END 
  FROM 
    TABLE (
      GENERATOR(ROWCOUNT => 2192)
    )
), 
GROUPS AS (
  SELECT 
    location_id, 
    subscription_id, 
    MIN(start_date) AS LOCAL_START, 
    MAX(start_date) AS LOCAL_END 
  FROM 
    My_First_Table 
  GROUP BY 
    location_id, 
    subscription_id
), 
GROUP_SPINE AS (
  SELECT 
    location_id, 
    subscription_id, 
    SPINE_START AS GROUP_START, 
    SPINE_END AS GROUP_END 
  FROM 
    GROUPS G CROSS 
    JOIN LATERAL (
      SELECT 
        SPINE_START, 
        SPINE_END 
      FROM 
        GLOBAL_SPINE S 
      WHERE 
        S.SPINE_START >= G.LOCAL_START
    )
) 
SELECT 
  G.location_id AS GROUP_BY_location_id, 
  G.subscription_id AS GROUP_BY_subscription_id, 
  GROUP_START, 
  GROUP_END, 
  T.* 
FROM 
  GROUP_SPINE G 
  LEFT JOIN My_First_Table T ON start_date >= G.GROUP_START 
  AND start_date < G.GROUP_END 
  AND G.location_id = T.location_id 
  AND G.subscription_id = T.subscription_id
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文