如何使用 match_recognize 定义模式来查找不连续的有序事件?

发布于 2025-01-18 10:24:35 字数 709 浏览 0 评论 0原文

我试图找到包含 3 个特定事件的会话,它们需要排序,这意味着 event_1 首先发生,然后是 event_2,然后是 event_3,但它们不需要一个接一个地集中。相反,任意数量的其他随机事件可以介于它们之间。如何在 match_recognize 子句中定义一个模式,以允许我使用分类器语句标记这些事件,并在序列不完整的情况下标记它们,例如仅发生 event_1,或者发生 event_1 + event_2?

或者有没有其他更有效且不涉及 match_recognize 的方法?我试图避免多次联接,因为数据很大。

这是一个用于演示的虚拟查询:

select 
    session_id,
    event,
    event_dttm
from events
match_recognize (
    partition by session_id
    order by event_dttm
    measures
        classifier as var
    all rows per match with unmatched rows
    pattern (???answer needed???)
    define
        event_1 as event = 'Click image',
        event_2 as event = 'Open profile',
        event_3 as event = 'Leave review');

I'm trying to find sessions that contain 3 specific events, they need to be ordered, meaning that event_1 happens first, then event_2, then event_3, but they don't need to be concentrated exactly one after another. Instead, any number of other random events can be in between them. How do I define a pattern in the match_recognize clause to allow me to mark these events with the classifier statement, and also mark them in case the sequence is incomplete, if for example only event_1 happens, or if event_1 + event_2 happens?

Or is there any other way to do this that is more efficient and doesn't involve match_recognize? I'm trying to avoid multiple joins because data is huge.

Here's a dummy query for presentation:

select 
    session_id,
    event,
    event_dttm
from events
match_recognize (
    partition by session_id
    order by event_dttm
    measures
        classifier as var
    all rows per match with unmatched rows
    pattern (???answer needed???)
    define
        event_1 as event = 'Click image',
        event_2 as event = 'Open profile',
        event_3 as event = 'Leave review');

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

套路撩心 2025-01-25 10:24:35

continue

You can put another event in that is the "but not those others" and then have 0-many matches on that between:

with this data:

with events(session_id, event, event_dttm) as (
    SELECT * FROM VALUES
    (99, 0, 10)
    ,(99, 1, 11)
    ,(99, 2, 12)
    ,(99, 3, 13)
    
    ,(98, 1, 10)
    ,(98, 2, 11)
    ,(98, 3, 12)
    ,(98, 0, 13)
    
    ,(100, 1, 10)
    ,(100, 2, 11)
    ,(100, 3, 12)
    
    ,(101, 1, 10)
    ,(101, 0, 11)
    ,(101, 2, 12)
    ,(101, 3, 13)

    ,(102, 1, 10)
    ,(102, 0, 11)
    ,(102, 0, 12)
    ,(102, 2, 13)
    ,(102, 3, 14)

    ,(103, 1, 10)
    ,(103, 0, 11)
    ,(103, 2, 12)
    ,(103, 0, 13)
    ,(103, 3, 14)
    
    ,(104, 1, 10)
    ,(104, 0, 11)
    ,(104, 2, 12)
    ,(104, 0, 13)
    /* incomplete ,(104, 3, 14) */
)
select 
    *
from events
match_recognize (
    partition by session_id
    order by event_dttm
    measures
        classifier as var
    all rows per match with unmatched rows
    pattern (e1 ex* e2 ex* e3)
    define
        e1 as event = 1,
        e2 as event = 2,
        e3 as event = 3,
        ex as event not in (1,2,3))
ORDER BY 1,3;

gives:

SESSION_IDEVENTEVENT_DTTMVAR
98110E1
98211E2
98312E3
98013
99010
99111E1
99212E2
99313E3
100110E1
100211E2
100312E3
101110E1
101011EX
101212E2
101313E3
102110E1
102011EX
102012EX
102213E2
102314E3
103110E1
103011EX
103212E2
103013EX
103314E3
104110
104011
104212
104013
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文