基于SQL或Power查询中的列值的复杂行操作

发布于 2025-01-28 18:43:45 字数 503 浏览 3 评论 0 原文

我有一个电话数据集。看起来像这样

“在此处输入图像描述”

如果有关某个成员的呼叫在“原始呼叫”后的30天内出现,则该呼叫被认为是回调。我需要一些逻辑或电源查询魔法来使用此逻辑来处理此数据集。因此,最终结果应该看起来像这样

​我以为我可以做些什么,但是事实证明这很困难,当我有超过200万个独特的案例钥匙时,重复项会杀死运行时间和超载内存。有什么建议吗?我更喜欢在Power查询编辑器中进行操作,但可以在SQL中进行操作。 plz,谢谢你。

I have a call dataset. Looks like this

enter image description here

If a call about a certain member comes in within 30 days of an "original call", that call is considered a callback. I need some logic or Power Query magic to handle this dataset using this logic. So the end result should look like this

enter image description here

Right now, I have the table left joined to itself which gives me every possible combination. I thought I could do something with that but it's proven difficult and when I have over 2 million unique case keys, the duplicates kill run time and overload memory. Any suggestions? I'd prefer to do the manipulation in Power Query editor but can do it in SQL. Plz and thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

杀手六號 2025-02-04 18:43:45

我认为您可以在电源查询中执行此操作,但是我不知道它会以200万张记录运行。

可以通过明智地使用 table.buffer 函数来加速它。但是,先尝试一下。

进行合理的自我文献

  • 该代码应为每个成员ID的成员ID
  • 组,从使用所述逻辑创建的记录列表中创建一个表。
  • 扩展表
  • 标记要删除的行通过将日期列列转换为一个,并将适当的逻辑应用于日期和移位列来删除的行。
  • 代码假定每个成员ID的日期按升序顺序排列。如果不是,则需要添加一个额外的排序步骤,

请尝试此M代码。 (更改源线以与您自己的数据源一致)

编辑:
编辑的代码允许从初始调用

let

//Change next line to be congruent with your actual data source
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
    
    #"Changed Type" = Table.TransformColumnTypes(Source,{
        {"Case key", type text}, {"Member ID", Int64.Type}, {"Call Date", type date}}),

//Group by Member ID
//  then create tables with call back date using the stated logic
       #"Grouped Rows" = Table.Group(#"Changed Type", {"Member ID"}, {
        {"Call Backs",(t)=>Table.FromRecords(
            List.Generate(
            ()=>[ck=t[Case key]{0}, cd=t[Call Date]{0}, cb = null, df=null, idx=0],
            each [idx] < Table.RowCount(t),
            each [ck=if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30 
                        then [ck] else t[Case key]{[idx]+1},
                cd=if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30 
                        then [cd] else t[Call Date]{[idx]+1},
                    cb = if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30  
                            then t[Call Date]{[idx]+1} else null,
                    df = if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30
                            then Duration.Days(t[Call Date]{[idx]+1} - [cd]) else null,
                    idx = [idx]+1],
            each Record.FromList({[ck],[cd],[cb],[df]},{"Case key","Call Date","Call Back Date", "Datediff"}))
        )}
        }),
    #"Expanded Call Backs" = Table.ExpandTableColumn(#"Grouped Rows", "Call Backs", 
        {"Case key", "Call Date", "Call Back Date", "Datediff"}, 
        {"Case key", "Call Date", "Call Back Date", "Datediff"}),

    #"Shifted Datediff" = Table.FromColumns(
        Table.ToColumns(#"Expanded Call Backs") & {
        List.RemoveFirstN(#"Expanded Call Backs"[Datediff]) & {null}},
            type table[Member ID=Int64.Type, Case key=text, Call Date=date, Call Back Date=date, Datediff=Int64.Type, shifted=Int64.Type ]),

    #"Filter" = Table.SelectRows(#"Shifted Datediff", each [shifted]=null or [Datediff]<>null),
    #"Removed Columns" = Table.RemoveColumns(Filter,{"shifted"})
in
    #"Removed Columns"

”在此处输入图像说明“

示例有多个回调

I think you can do this in Power Query, but I have no idea how it will run with two million records.

It may be able to be sped up with judicious use of the Table.Buffer function. But give it a try as written first.

The code should be reasonably self-documenting

  • Group by Member ID
  • For each Member ID, create a table from a list of records which is created using the stated logic.
  • expand the tables
  • Mark the rows to be deleted by shifting up the Datediff column by one and applying appropriate logic to the Datediff and shifted columns.
  • Code assumes that the dates for each Member ID are in ascending order. If not, an extra sorting step would need to be added

Try this M code. (Change the Source line to be congruent with your own data source).

Edit:
Code edited to allow for multiple call backs from an initial call

let

//Change next line to be congruent with your actual data source
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
    
    #"Changed Type" = Table.TransformColumnTypes(Source,{
        {"Case key", type text}, {"Member ID", Int64.Type}, {"Call Date", type date}}),

//Group by Member ID
//  then create tables with call back date using the stated logic
       #"Grouped Rows" = Table.Group(#"Changed Type", {"Member ID"}, {
        {"Call Backs",(t)=>Table.FromRecords(
            List.Generate(
            ()=>[ck=t[Case key]{0}, cd=t[Call Date]{0}, cb = null, df=null, idx=0],
            each [idx] < Table.RowCount(t),
            each [ck=if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30 
                        then [ck] else t[Case key]{[idx]+1},
                cd=if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30 
                        then [cd] else t[Call Date]{[idx]+1},
                    cb = if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30  
                            then t[Call Date]{[idx]+1} else null,
                    df = if Duration.Days(t[Call Date]{[idx]+1} - [cd]) < 30
                            then Duration.Days(t[Call Date]{[idx]+1} - [cd]) else null,
                    idx = [idx]+1],
            each Record.FromList({[ck],[cd],[cb],[df]},{"Case key","Call Date","Call Back Date", "Datediff"}))
        )}
        }),
    #"Expanded Call Backs" = Table.ExpandTableColumn(#"Grouped Rows", "Call Backs", 
        {"Case key", "Call Date", "Call Back Date", "Datediff"}, 
        {"Case key", "Call Date", "Call Back Date", "Datediff"}),

    #"Shifted Datediff" = Table.FromColumns(
        Table.ToColumns(#"Expanded Call Backs") & {
        List.RemoveFirstN(#"Expanded Call Backs"[Datediff]) & {null}},
            type table[Member ID=Int64.Type, Case key=text, Call Date=date, Call Back Date=date, Datediff=Int64.Type, shifted=Int64.Type ]),

    #"Filter" = Table.SelectRows(#"Shifted Datediff", each [shifted]=null or [Datediff]<>null),
    #"Removed Columns" = Table.RemoveColumns(Filter,{"shifted"})
in
    #"Removed Columns"

enter image description here

Example with multiple callbacks
enter image description here

云归处 2025-02-04 18:43:45

认为您可以使用铅功能来执行此操作。

这是小提琴

select 
  a.*, 
  LEAD(CallDate, 1) OVER (
    Partition by memberId 
    ORDER BY 
      CallDate
  ) AS "CallbackDate", 
  LEAD(CallDate, 1) OVER (
    Partition by memberId 
    ORDER BY 
      CallDate
  ) - a.calldate AS DateDiff 
from 
  mytable a

Think you can do this with Lead function.

here is the fiddle https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=f7cabdbe4d1193e5f0da6bd6a4571b96

select 
  a.*, 
  LEAD(CallDate, 1) OVER (
    Partition by memberId 
    ORDER BY 
      CallDate
  ) AS "CallbackDate", 
  LEAD(CallDate, 1) OVER (
    Partition by memberId 
    ORDER BY 
      CallDate
  ) - a.calldate AS DateDiff 
from 
  mytable a
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文