根据遵循唯一值组合的日期进行分组和最小最大日期分配

发布于 2025-01-09 01:24:12 字数 8435 浏览 1 评论 0原文

我正在尝试解决另一个差距和岛屿问题。

我有以下示例数据集:

MEM_IDCLM_IDBGN_SERV_DTEND_SERV_DTADM_DTDCHG_DTPROV
111101-01-202101-31-202101-01-202101-01-20211
111201-01-202101-28-202101-01-202101-01-20211
111301-01-202101-25-202101-01-202101-01-20211
111402-01-202102-10-202101-01- 2021年01-01-20211
111502-11-202102-15-202101-01-202101-01-20211
111602-15-202102-17-202102-15-202102-15- 2021年2
11172021年2月18日15年2月25日 2021年2月2021日 2021年2月15日3
11182021年3月18日2021年3月25日 2021年2月16日 2021年2月16日3
221101-01-202101-31-202101-01-202101-01-20211
221203-01-202103-05-202103-01-202103-01-20212
331102-01- 2021年02-08-202102-01-202102-01-20211
331202-08-202102-10-202102-01-202102-01-20212

我想要完成的任务如下:

基于基于 MEM_ID、PROV 和 ADM_DT 的唯一组合,我想将事件分组为 1 并分配最小 BGN_SERV_DT 和最大 END_SERV_DT。如果 DCHG_DT 与 ADM_DT 相同或最多提前 1 天,我还需要对来自相同 MEM_ID 和 PROV 的任何其他事件进行分组。

最终结果应如下所示:

MEM_IDCLM_IDBGN_SERV_DTEND_SERV_DTADM_DTDCHG_DTPROVGROUPMIN_DTMAX_DT
111101-01-202101-31-202101-01-202101-01-20211101-01-202102-15-2021
111201-01-202101-28-202101-01-202101-01-20211101-01-202102-15-2021
111301-01-202101-25- 2021年01-01-202101-01-20211101-01-202102-15-2021
111402-01-202102-10-202101-01-202101-01-20211101-01- 2021年02-15-2021
111502-11-202102-15-202101-01-202101-01-20211101-01-202102-15-2021
111602-15-202102-17- 2021年02-15-202102-15-20212202-15-202102-17-2021
111702-18-202102-25-202102-15-202102-15-20213302-18- 2021年03-25-2021
111803-18-202103-25-202102-16-202102-16-20213302-18-202103-25-2021
221101-01-202101-31- 2021年01-01-202101-01-20211101-01-202101-31-2021
221203-01-202103-05-202103-01-202103-01-20212203-01- 2021年03-05-2021
331102-01-202102-08-202102-01-202102-01-20211102-01-202102-08-2021
331202-08-202102-10- 2021年02-01-202102-01-20212202-08-202102-10-2021

您会注意到某些 ADM_DT 小于 BGN_SERV_DT,这不是数据错误。

在预期输出中,MEM_ID 1 的前 5 条记录被分组到组 1 中,因为它满足具有相同 MEM_ID、PROV 和 ADM_DT 组合的条件。记录 6 和 7 具有相同的 MEM_ID 和 ADM_DT 组合,但不同的 PROV,因此它们分为组 2 和 3。记录 8 具有相同的 MEM_ID 和 PROV 组合,但 ADM_DT 与记录 7 不同。但是,因为 ADM_DT记录 8 中的 DCHG_DT 距记录 7 中的 1 天内,它被认为是同一事件并分配到组 3。其余数据点非常简单为什么它们会这样分组。

这是我通过以下链接提出的上一个问题的后续:

根据与上一行匹配的开始日期进行分组结束日期 SQL

我尝试了以下代码:


SELECT DISTINCT MEM_ID
    , PROV
    , CLM_ID
    , ADM_DT
    , DCHG_DT
    , sum(ISSTART) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT rows unbounded preceding) AS GROUP

FROM (
    SELECT DISTINCT MEM_ID
        , PROV
        , CLM_ID
        , ADM_DT
        , DCHG_DT
        , CASE 
            WHEN PROV = LAG(PROV) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                AND (
                        (
                        ADM_DT = LAG(DCHG_DT) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                        )
                    OR (
                        ADM_DT = LAG(DCHG_DT + 1) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                        )
                    )
                THEN 0
            ELSE 1
            END AS ISSTART
    
    FROM c1
    ) t



上面的代码已捕获大部分实例,但我无法弄清楚如何添加最小和最大日期。我也不确定我的案例陈述是否被正确使用。

任何帮助将不胜感激。

先感谢您!

编辑:

经过进一步调查,看来我创建岛屿的子查询可能有错误,或者我的总和语句计算不正确。

我尝试在 sum 语句中更改以下代码,看看是否可以解决问题


SELECT DISTINCT MEM_ID
    , PROV
    , CLM_ID
    , ADM_DT
    , DCHG_DT
    , sum(ISSTART) OVER (PARTITION BY MEM_ID,ADM_DT ORDER BY ADM_DT, DCHG_DT rows unbounded preceding) AS GROUP

FROM (
    SELECT DISTINCT MEM_ID
        , PROV
        , CLM_ID
        , ADM_DT
        , DCHG_DT
        , CASE 
            WHEN PROV = LAG(PROV) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                AND (
                        (
                        ADM_DT = LAG(DCHG_DT) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                        )
                    OR (
                        ADM_DT = LAG(DCHG_DT + 1) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                        )
                    )
                THEN 0
            ELSE 1
            END AS ISSTART
    
    FROM c1
    ) t



,但是我最终得到的结果如下:

MEM_IDCLM_IDBGN_SERV_DTEND_SERV_DTADM_DTDCHG_DTPROVGROUP
111101-01-202101-31-202101- 2021年1月2021年1月1日10
111201-01-202101-28-202101-01-202101-01-202111
111301-01-202101-25-202101-01-202101-01-202111
111402-01-202102-10-202101-01-202101-01-202111
111502-11-202102-15-202101-01-202101-01-202111
11162021年22021年2月15日2021年2月15日 2021年2月年2月18日
15日 2021年2月15日 2 0 1117年2月17日20212021年2月25日 2021月15日 2021年2月15日30
11182021年3月18日2021年3月25日2021年2月16日2021年2月16日31

I am having another gaps and islands problem that I am trying to solve.

I have the following sample dataset:

MEM_IDCLM_IDBGN_SERV_DTEND_SERV_DTADM_DTDCHG_DTPROV
111101-01-202101-31-202101-01-202101-01-20211
111201-01-202101-28-202101-01-202101-01-20211
111301-01-202101-25-202101-01-202101-01-20211
111402-01-202102-10-202101-01-202101-01-20211
111502-11-202102-15-202101-01-202101-01-20211
111602-15-202102-17-202102-15-202102-15-20212
111702-18-202102-25-202102-15-202102-15-20213
111803-18-202103-25-202102-16-202102-16-20213
221101-01-202101-31-202101-01-202101-01-20211
221203-01-202103-05-202103-01-202103-01-20212
331102-01-202102-08-202102-01-202102-01-20211
331202-08-202102-10-202102-01-202102-01-20212

What I am trying to accomplish is the following:

Based on the unique combination of a MEM_ID,PROV,and ADM_DT, I want to group the events into 1 and assign the min BGN_SERV_DT and max END_SERV_DT. I also need to group any additional events from the same MEM_ID and PROV if the DCHG_DT is the same as the ADM_DT or up to 1 day ahead.

The end result should look like this:

MEM_IDCLM_IDBGN_SERV_DTEND_SERV_DTADM_DTDCHG_DTPROVGROUPMIN_DTMAX_DT
111101-01-202101-31-202101-01-202101-01-20211101-01-202102-15-2021
111201-01-202101-28-202101-01-202101-01-20211101-01-202102-15-2021
111301-01-202101-25-202101-01-202101-01-20211101-01-202102-15-2021
111402-01-202102-10-202101-01-202101-01-20211101-01-202102-15-2021
111502-11-202102-15-202101-01-202101-01-20211101-01-202102-15-2021
111602-15-202102-17-202102-15-202102-15-20212202-15-202102-17-2021
111702-18-202102-25-202102-15-202102-15-20213302-18-202103-25-2021
111803-18-202103-25-202102-16-202102-16-20213302-18-202103-25-2021
221101-01-202101-31-202101-01-202101-01-20211101-01-202101-31-2021
221203-01-202103-05-202103-01-202103-01-20212203-01-202103-05-2021
331102-01-202102-08-202102-01-202102-01-20211102-01-202102-08-2021
331202-08-202102-10-202102-01-202102-01-20212202-08-202102-10-2021

You will notice that some of the ADM_DT are less than the BGN_SERV_DT, that is not a data error.

In the expected output, the first 5 records for MEM_ID 1 was grouped into group 1 because it meets the criteria of having the same MEM_ID,PROV, and ADM_DT combination. Records 6 and 7 have the same MEM_ID and ADM_DT combination but different PROV, therefore they are broken into groups 2 and 3. Record 8 has the same combination of MEM_ID and PROV but the ADM_DT is different to that of Record 7. However because the ADM_DT from record 8 is within 1 day of the DCHG_DT from record 7, it is considered the same event and assigned group 3. The rest of the data points are pretty straight forward on why they are grouped the way they are.

This is a follow up to a previous question that i asked via the below link:

Grouping based on start date matching the previous row's end date SQL

I have tried the following code:


SELECT DISTINCT MEM_ID
    , PROV
    , CLM_ID
    , ADM_DT
    , DCHG_DT
    , sum(ISSTART) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT rows unbounded preceding) AS GROUP

FROM (
    SELECT DISTINCT MEM_ID
        , PROV
        , CLM_ID
        , ADM_DT
        , DCHG_DT
        , CASE 
            WHEN PROV = LAG(PROV) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                AND (
                        (
                        ADM_DT = LAG(DCHG_DT) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                        )
                    OR (
                        ADM_DT = LAG(DCHG_DT + 1) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                        )
                    )
                THEN 0
            ELSE 1
            END AS ISSTART
    
    FROM c1
    ) t



The above code, has captured most of the instances but i am unable to figure out how to add the min and max dates. I am also unsure if my case statement is correctly being used.

Any help would be appreciated.

Thank you in advance!

EDIT:

After further investigation, it appears that either my subquery to create the islands might be faulty or my sum statement isnt being calculated correctly.

I have attempted the following code change in the sum statement to see if it fixes the issue


SELECT DISTINCT MEM_ID
    , PROV
    , CLM_ID
    , ADM_DT
    , DCHG_DT
    , sum(ISSTART) OVER (PARTITION BY MEM_ID,ADM_DT ORDER BY ADM_DT, DCHG_DT rows unbounded preceding) AS GROUP

FROM (
    SELECT DISTINCT MEM_ID
        , PROV
        , CLM_ID
        , ADM_DT
        , DCHG_DT
        , CASE 
            WHEN PROV = LAG(PROV) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                AND (
                        (
                        ADM_DT = LAG(DCHG_DT) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                        )
                    OR (
                        ADM_DT = LAG(DCHG_DT + 1) OVER (PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT)
                        )
                    )
                THEN 0
            ELSE 1
            END AS ISSTART
    
    FROM c1
    ) t



However what I end up with is the following,

MEM_IDCLM_IDBGN_SERV_DTEND_SERV_DTADM_DTDCHG_DTPROVGROUP
111101-01-202101-31-202101-01-202101-01-202110
111201-01-202101-28-202101-01-202101-01-202111
111301-01-202101-25-202101-01-202101-01-202111
111402-01-202102-10-202101-01-202101-01-202111
111502-11-202102-15-202101-01-202101-01-202111
111602-15-202102-17-202102-15-202102-15-202120
111702-18-202102-25-202102-15-202102-15-202130
111803-18-202103-25-202102-16-202102-16-202131

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

三人与歌 2025-01-16 01:24:12

如果其他一切正常,并且您只需要两列 MIN_DTMAX_DT,那么您可以使用 maxmin窗口函数:

SELECT 
  DISTINCT mem_id, 
  prov, 
  clm_id, 
  adm_dt, 
  dchg_dt, 
  Sum(isstart) OVER(
    partition BY mem_id 
    ORDER BY 
      adm_dt, 
      dchg_dt rows UNBOUNDED PRECEDING
  ) AS GROUP, 
  Min(bgn_serv_dt) OVER(partition BY mem_id, prov, adm_dt) AS MIN_DT, 
  Max(end_serv_dt) OVER(partition BY mem_id, prov, adm_dt) AS MAX_DT 
FROM 
  (
    SELECT 
      DISTINCT mem_id, 
      prov, 
      clm_id, 
      adm_dt, 
      dchg_dt, 
      bgn_serv_dt, 
      end_serv_dt, 
      CASE WHEN prov = Lag(prov) OVER(
        partition BY mem_id 
        ORDER BY 
          adm_dt, 
          dchg_dt
      ) 
      AND (
        (
          adm_dt = Lag(dchg_dt) OVER(
            partition BY mem_id 
            ORDER BY 
              adm_dt, 
              dchg_dt
          )
        ) 
        OR (
          adm_dt = Lag(dchg_dt + 1) OVER(
            partition BY mem_id 
            ORDER BY 
              adm_dt, 
              dchg_dt
          )
        )
      ) THEN 0 ELSE 1 END AS ISSTART 
    FROM 
      c1
  ) t

If everything else works fine and you just need the two columns MIN_DT and MAX_DT, then you can use the max and min window functions:

SELECT 
  DISTINCT mem_id, 
  prov, 
  clm_id, 
  adm_dt, 
  dchg_dt, 
  Sum(isstart) OVER(
    partition BY mem_id 
    ORDER BY 
      adm_dt, 
      dchg_dt rows UNBOUNDED PRECEDING
  ) AS GROUP, 
  Min(bgn_serv_dt) OVER(partition BY mem_id, prov, adm_dt) AS MIN_DT, 
  Max(end_serv_dt) OVER(partition BY mem_id, prov, adm_dt) AS MAX_DT 
FROM 
  (
    SELECT 
      DISTINCT mem_id, 
      prov, 
      clm_id, 
      adm_dt, 
      dchg_dt, 
      bgn_serv_dt, 
      end_serv_dt, 
      CASE WHEN prov = Lag(prov) OVER(
        partition BY mem_id 
        ORDER BY 
          adm_dt, 
          dchg_dt
      ) 
      AND (
        (
          adm_dt = Lag(dchg_dt) OVER(
            partition BY mem_id 
            ORDER BY 
              adm_dt, 
              dchg_dt
          )
        ) 
        OR (
          adm_dt = Lag(dchg_dt + 1) OVER(
            partition BY mem_id 
            ORDER BY 
              adm_dt, 
              dchg_dt
          )
        )
      ) THEN 0 ELSE 1 END AS ISSTART 
    FROM 
      c1
  ) t
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文