当值更改时,使用mySQL窗口函数到extirallies

发布于 2025-02-11 10:38:48 字数 2083 浏览 0 评论 0原文

每当分区内的值更改时,我正在寻找一种将数据集拆分为部分的方法。请注意,每个值的开关都是一个新的部分 - 同样,如果值与早期部分相同。

尝试使用一些虚拟数据说明:

schema(mysql v8.0)

CREATE TABLE test (
  id INT AUTO_INCREMENT PRIMARY KEY,
  group_id INT,
  date DATE,
  value VARCHAR(255)
);
INSERT INTO test (group_id, date, value) VALUES
    (2, '2022-03-12', 'D'),
    (2, '2022-03-15', 'C'),
    (1, '2022-03-27', 'B'),
    (1, '2022-02-01', 'A'),
    (2, '2022-03-10', 'D'),
    (1, '2022-05-01', 'C'),
    (2, '2022-01-19', 'C'),
    (1, '2022-01-18', 'A'),
    (2, '2022-04-12', 'D'),
    (1, '2022-04-08', 'A');

查询#1

SELECT
    id,
    group_id,
    date,
    value,
    '' as section # This is the one that I don't know how to calculate
FROM test ORDER BY group_id, date;

我想要的结果(该部分未正确计算):

IDGroup_ID日期部分
第812022-01-18A1
412022-02-01A1
312022-03-27B2
1012022-04-08A3
612022-05-05-01C4
722022-01--01- 19C5
522022-03-10D6
122022-03-12D6
222022-03-15C7
922022-04-12D8

在db fiddle上查看

请注意我的主要问题是我不能通过(group_id,value,value>) ))由于每当值更改时,这是一个新部分,无论是否将 Back 更改为早期值(例如,请参见值a)。

I am looking for a way to split a dataset into sections whenever a value change inside a partition. Note that each switch of value is a new section - also if the value switches back to the same as in an earlier section.

Trying to illustrate using some dummy data:

Schema (MySQL v8.0)

CREATE TABLE test (
  id INT AUTO_INCREMENT PRIMARY KEY,
  group_id INT,
  date DATE,
  value VARCHAR(255)
);
INSERT INTO test (group_id, date, value) VALUES
    (2, '2022-03-12', 'D'),
    (2, '2022-03-15', 'C'),
    (1, '2022-03-27', 'B'),
    (1, '2022-02-01', 'A'),
    (2, '2022-03-10', 'D'),
    (1, '2022-05-01', 'C'),
    (2, '2022-01-19', 'C'),
    (1, '2022-01-18', 'A'),
    (2, '2022-04-12', 'D'),
    (1, '2022-04-08', 'A');

Query #1

SELECT
    id,
    group_id,
    date,
    value,
    '' as section # This is the one that I don't know how to calculate
FROM test ORDER BY group_id, date;

Results I want (the section is not calculated correctly):

idgroup_iddatevaluesection
812022-01-18A1
412022-02-01A1
312022-03-27B2
1012022-04-08A3
612022-05-01C4
722022-01-19C5
522022-03-10D6
122022-03-12D6
222022-03-15C7
922022-04-12D8

View on DB Fiddle

Notice that my main problem is that I cannot partition by (group_id, value) since it's a new section whenever the value changes, regardless if it changes back to an earlier value (see value A for example).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

橙味迷妹 2025-02-18 10:38:48

您将获得一个数字,可以通过减去距离该group_id的总数到目前为止的总数的次数来区分Group_ID的值的时间。一点点想法会向您展示此值在相同值的系列中始终相同,并且与在不同时间出现的相同值始终不同。

从该数字中,您可以计算您的顺序截面号。可能有一种直接执行此操作的方法(以较少的子查询),但是我必须使用一个中间步骤来获得group_id的特定值运行的日期。

SELECT id, group_id, date, value,
    dense_rank() over (partition by group_id order by group_value_incidence_start) section
FROM (    
    SELECT id, group_id, date, value,
        min(date) over (partition by group_id, value, group_value_incidence) group_value_incidence_start
    FROM (
        SELECT id, group_id, date, value,
            count(1) over (partition by group_id order by date) -
                count(1) over (partition by group_id, value order by date) group_value_incidence
        FROM test
    ) group_value_indidences
) group_value_incidence_starts
ORDER BY group_id, section

You get a number that distinguishes which time a value has occurred for a group_id by subtracting the number of times that value has occurred for the group_id so far from the total occurrences of that group_id so far; a little thought will show you this value will always be the same within a series of the same value and always different from that same value appearing at a different time.

From that number, you can calculate your sequential section number. There may be a way to do that directly (with one fewer subquery), but I had to use an intermediate step of getting the date that a particular run of values for a group_id started.

SELECT id, group_id, date, value,
    dense_rank() over (partition by group_id order by group_value_incidence_start) section
FROM (    
    SELECT id, group_id, date, value,
        min(date) over (partition by group_id, value, group_value_incidence) group_value_incidence_start
    FROM (
        SELECT id, group_id, date, value,
            count(1) over (partition by group_id order by date) -
                count(1) over (partition by group_id, value order by date) group_value_incidence
        FROM test
    ) group_value_indidences
) group_value_incidence_starts
ORDER BY group_id, section

fiddle

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文