选择在特定日期之后没有活动的客户

发布于 2025-01-19 17:47:50 字数 4497 浏览 3 评论 0原文

我有两个表涉及这种情况:

tools_events

COLUMN_NAMEDATA_TYPE
event_idint
event_typevarchar
event_valuevarchar
event_clientint
event_timestampdatetime

tools_clients

COLUMN_NAMEDATA_TYPE
client_idint
client_namevarchar
client_developerint

这是使用 MySQL 5.6 。

事件表显然存储一天中发生的事件,而客户端包含客户端列表。

我正在寻找缺乏的特定事件。对于此特定事件,event_type 将为 taskUpdatedevent_code 将为需要批准 (dev)

目前,我有一个声明放在一起,将拉出所有在指定日期之后已获得记录事件的客户端,该声明似乎运行良好,尽管速度非常慢:

SELECT 
    t1.event_id, t1.event_user, t1.event_client, t1.event_timestamp
FROM 
    tools_events AS t1 
WHERE 
    t1.event_id = (SELECT t2.event_id 
                   FROM tools_events AS t2 
                   WHERE t2.event_type = 'taskUpdated' 
                     AND t2.event_value = 'needs approval (dev)' 
                     AND t2.event_client = t1.event_client 
                     AND t2.event_timestamp >= '2022-04-04 00:00:00' 
                   ORDER BY t2.event_timestamp DESC LIMIT 1)

我正在寻找的是一种方法来自 tools_clientsclient_idclient_nameclient_developer(适用于在指定日期之后尚未记录此事件的客户端) 。由于无法将此列表拉入 PHP 并交叉引用完整的客户端列表来查看哪些客户端不存在,所以我不确定如何在 SQL 查询中解决此问题。

另一方面,似乎只要获取与初始条件匹配的事件列表就可以立即运行,因此也许直接提取这些事件并在 PHP 中执行所有逻辑会更有效?随着我对 SQL 的学习越来越深入,我的理解是,如果你能够正确地优化查询,那么 SQL 通常会更高效,但我认为我做不到。

以下是 tools_clients 表的一些示例数据:

client_idclient_nameclient_developer
1foo12
2fooBar15
3Bar21
4raBoof37

以下是 tools_events 表的一些示例数据:

event_idevent_typeevent_valueevent_clientevent_timestamp
23taskUpdated需要批准(开发)12022-04-02 13:08:22
25taskUpdated需要批准 (dev)22022-04-02 14:22:07
28taskUpdated需要批准 (dev)32022-04-02 15:09:13
29taskUpdated需要批准(开发)42022-04-02 15:36:17
32taskUpdated需要批准(开发)32022-04-05 16:42:35
38taskUpdated需要批准(开发)42022-04-05 19:01:25

鉴于限制:

WHERE event_type = 'taskUpdated' 
  AND event_value = 'needs approval (dev)' 
  AND event_timestamp >= '2022-04-04 00:00:00' 

我会寻找结果如下,因为这是唯一两个在指定日期之后没有发生匹配事件的结果:

client_idclient_nameclient_developer
1foo12
2fooBar15

对此的任何帮助,即使它是正确方向的一点,我们也非常感激。


编辑:

使用 Paul Maxwell 在他的答案中提供的原始查询的简化版本,作为一个子查询和更多的谷歌搜索,我想我能够得到一些东西。我的理解是子查询比 JOIN 效率低,所以我不确定这是否是一个可靠的解决方案,但它似乎运行良好:

SELECT client_id, client_name, client_developer
FROM `tools_clients` AS clients
WHERE client_status > 0
  AND NOT EXISTS (SELECT event_client
                  FROM tools_events AS events
                  WHERE event_type = 'taskUpdated'
                    AND event_value = 'needs approval (dev)'
                    AND event_timestamp >= '2022-04-04 00:00:00'
                    AND clients.client_id = events.event_client
                  GROUP BY event_client)

旁注:我知道 client_status 中没有提到原来的问题。目前我并不担心这部分,因为我知道一旦我得到了一些可以正常工作的东西,我就可以轻松地添加这种情况。

I have two tables involved in this situation:

tools_events

COLUMN_NAMEDATA_TYPE
event_idint
event_typevarchar
event_valuevarchar
event_clientint
event_timestampdatetime

tools_clients

COLUMN_NAMEDATA_TYPE
client_idint
client_namevarchar
client_developerint

This is using MySQL 5.6.

The events table obviously stores events that are occurring throughout a day, and clients contains a list of clients.

There is a particular event I am looking for the lack of. For this particular event the event_type would be taskUpdated and event_code would be needs approval (dev).

Currently I have a statement put together that will pull all clients that have gotten an event logged after the specified date which seems to be working well, albeit extremely slow:

SELECT 
    t1.event_id, t1.event_user, t1.event_client, t1.event_timestamp
FROM 
    tools_events AS t1 
WHERE 
    t1.event_id = (SELECT t2.event_id 
                   FROM tools_events AS t2 
                   WHERE t2.event_type = 'taskUpdated' 
                     AND t2.event_value = 'needs approval (dev)' 
                     AND t2.event_client = t1.event_client 
                     AND t2.event_timestamp >= '2022-04-04 00:00:00' 
                   ORDER BY t2.event_timestamp DESC LIMIT 1)

What I'm looking for is a way to client_id, client_name, and client_developer from tools_clients for the clients who have not had this event logged past the specified date. Short of pulling this list into PHP and cross referencing a full list of the clients to see which ones aren't there, I am unsure of how to tackle this in a SQL query.

On the other end, it seems that just getting a list of events that match the initial criteria seems to run pretty much instantly so perhaps it would be more efficient to just pull those and do all of the logic in PHP? My understanding as I am learning more in depth SQL is that oftentimes SQL is more efficient if you can optimize the query correctly which I don't think I am.

Here is some sample data for the tools_clients table:

client_idclient_nameclient_developer
1foo12
2fooBar15
3Bar21
4raBoof37

And here is some sample data for the tools_events table:

event_idevent_typeevent_valueevent_clientevent_timestamp
23taskUpdatedneeds approval (dev)12022-04-02 13:08:22
25taskUpdatedneeds approval (dev)22022-04-02 14:22:07
28taskUpdatedneeds approval (dev)32022-04-02 15:09:13
29taskUpdatedneeds approval (dev)42022-04-02 15:36:17
32taskUpdatedneeds approval (dev)32022-04-05 16:42:35
38taskUpdatedneeds approval (dev)42022-04-05 19:01:25

Given the constraints:

WHERE event_type = 'taskUpdated' 
  AND event_value = 'needs approval (dev)' 
  AND event_timestamp >= '2022-04-04 00:00:00' 

I would be looking for the following result because these are the only two that did not have a matching event occurring after the specified date:

client_idclient_nameclient_developer
1foo12
2fooBar15

Any assistance in this even if it's a point in the right direction is greatly appreciated.


EDIT:

With the simplified version of my original query, provided by Paul Maxwell in his answer, as a subquery and a bit more Googling I think I was able to get something working. My understanding is that subqueries are less efficient than JOINs so I'm not sure if this is a solid solution or not but it seems to be running well:

SELECT client_id, client_name, client_developer
FROM `tools_clients` AS clients
WHERE client_status > 0
  AND NOT EXISTS (SELECT event_client
                  FROM tools_events AS events
                  WHERE event_type = 'taskUpdated'
                    AND event_value = 'needs approval (dev)'
                    AND event_timestamp >= '2022-04-04 00:00:00'
                    AND clients.client_id = events.event_client
                  GROUP BY event_client)

Side note: I know client_status is not mentioned in the original question. At the moment I wasn't worried about that part as I knew I could easily add in that condition once I got something that was otherwise working.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

余生一个溪 2025-01-26 17:47:50

现在提供的示例数据确实有帮助(很多),但是它不包含引用的某些列,也没有包含将为您寻求查询的结果提供结果的数据。因此,我添加了一些列和数据,并修改了一些数据以适合。

INSERT INTO tools_clients(client_id,client_name,client_developer)
VALUES (1,'foo',12),
       (2,'fooBar',15),
       (3,'Bar',21),
       (4,'raBoof',37),
       (5,'2answer-or-not',67); --<< added

INSERT INTO tools_events(event_id,event_type,event_value,event_client,event_timestamp)
VALUES
 (23,'taskUpdated','needs approval (dev)',1,'2022-04-04 13:08:22'), -- date change
 (25,'taskUpdated','needs approval (dev)',2,'2022-04-04 14:22:07'), -- date change
 (28,'taskUpdated','needs approval (dev)',3,'2022-04-04 15:09:13'), -- date change
 (29,'taskUpdated','needs approval (dev)',4,'2022-04-04 15:36:17'), -- date change
 (32,'taskUpdated','needs approval (dev)',3,'2022-04-05 16:42:35'),
 (38,'taskUpdated','needs approval (dev)',4,'2022-04-05 19:01:25');

INSERT INTO tools_events(event_id,event_type,event_value,event_client,event_timestamp) 
VALUES (138,'not-taskUpdated','not-needs approval (dev)',4,'2022-04-05 19:01:25'); -- added

因此,我们可以运行一个查询(基于所讨论的原始子查询):

SELECT
      t1.event_id
    , t1.event_user
    , t1.event_client
    , t1.event_timestamp
FROM tools_events AS t1
WHERE  (
        t1.event_type = 'taskUpdated'
        AND t1.event_value = 'needs approval (dev)'
        )
    AND t1.event_timestamp >= '2022-04-04 00:00:00'

+----------+------------+--------------+-----------------+
| event_id | event_user | event_client | event_timestamp |
+----------+------------+--------------+-----------------+
|       23 |            |            1 | 2022-04-04      |
|       25 |            |            2 | 2022-04-04      |
|       28 |            |            3 | 2022-04-04      |
|       29 |            |            4 | 2022-04-04      |
|       32 |            |            3 | 2022-04-05      |
|       38 |            |            4 | 2022-04-05      |
+----------+------------+--------------+-----------------+

该查询的否定

SELECT
      t1.event_id
    , t1.event_user
    , t1.event_client
    , t1.event_timestamp
FROM tools_events AS t1
WHERE  NOT (
        t1.event_type = 'taskUpdated'
        AND t1.event_value = 'needs approval (dev)'
        )
    AND t1.event_timestamp >= '2022-04-04 00:00:00'

+----------+------------+--------------+-----------------+
| event_id | event_user | event_client | event_timestamp |
+----------+------------+--------------+-----------------+
|      138 |            |            4 | 2022-04-05      |
+----------+------------+--------------+-----------------+

并通过使用not(...):注意event_client 4存在 。在这两个结果中,但没有提及event_client 5,

如果您正在寻找具有t1.event_type ='taskupdated'和t1的事件的客户端, 则稍后很重要.event_value ='需要批准(dev)' plus not(t1.event_type ='taskupedupded'和t1.event_value ='需求批准(dev)'然后这将产生结果:

SELECT client_id, client_name, client_developer
FROM `tools_clients` AS clients
WHERE client_id IN (
    SELECT
          t1.event_client
    FROM tools_events AS t1
    WHERE  NOT (
            t1.event_type = 'taskUpdated'
            AND t1.event_value = 'needs approval (dev)'
            )
    AND t1.event_timestamp >= '2022-04-04 00:00:00')


+-----------+-------------+------------------+
| client_id | client_name | client_developer |
+-----------+-------------+------------------+
|         4 | raBoof      |               37 |
+-----------+-------------+------------------+

但是,如果您只希望没有符合这些条件的事件的客户代码>然后有2个选项

SELECT client_id, client_name, client_developer
FROM `tools_clients` AS clients
left join tools_events AS events ON clients.client_id = events.event_client
        AND events.event_type = 'taskUpdated'
        AND events.event_value = 'needs approval (dev)'
        AND events.event_timestamp >= '2022-04-04 00:00:00'
WHERE client_status > 0
  AND events.event_client IS NULL

。一个不存在(也称为左半结合);

SELECT client_id, client_name, client_developer
FROM `tools_clients` AS clients
WHERE client_status > 0
  AND NOT EXISTS
    (SELECT NULL
      FROM tools_events AS events
      WHERE event_type = 'taskUpdated'
        AND event_value = 'needs approval (dev)'
        AND event_timestamp >= '2022-04-04 00:00:00'
        AND clients.client_id = events.event_client
      )

这两个查询结果均为:

+-----------+----------------+------------------+
| client_id |  client_name   | client_developer |
+-----------+----------------+------------------+
|         5 | 2answer-or-not |               67 |
+-----------+----------------+------------------+

注意使用时,>或不存在子查询的选择子句不必“返回”任何内容,因此您可以使用select null选择1选择 *。这是因为存在只是测试是否存在/where子句的标准。另外,由于Select子句并没有真正“返回”任何内容,因此不需要在不存在子查询中使用(或选择不同的)组。以这种形式的子查询浪费了努力,以这种形式的子查询将该组纳入该组也不太效率。

NB:这与在Select子句“返回”数据“返回”数据的情况下使用非常不同。

因此,希望您能看到有多种否定条件的方法 - 但是如何否定它们可以产生不同的总体结果。要查看所有运行使用的查询: db&lt; /a>

The sample data provided now does help (a lot) but it does not contain some columns that are referenced nor does it contain data that will present a result for the query you are seeking. So, I have added some columns and data and amended some data as well to suit.

INSERT INTO tools_clients(client_id,client_name,client_developer)
VALUES (1,'foo',12),
       (2,'fooBar',15),
       (3,'Bar',21),
       (4,'raBoof',37),
       (5,'2answer-or-not',67); --<< added

INSERT INTO tools_events(event_id,event_type,event_value,event_client,event_timestamp)
VALUES
 (23,'taskUpdated','needs approval (dev)',1,'2022-04-04 13:08:22'), -- date change
 (25,'taskUpdated','needs approval (dev)',2,'2022-04-04 14:22:07'), -- date change
 (28,'taskUpdated','needs approval (dev)',3,'2022-04-04 15:09:13'), -- date change
 (29,'taskUpdated','needs approval (dev)',4,'2022-04-04 15:36:17'), -- date change
 (32,'taskUpdated','needs approval (dev)',3,'2022-04-05 16:42:35'),
 (38,'taskUpdated','needs approval (dev)',4,'2022-04-05 19:01:25');

INSERT INTO tools_events(event_id,event_type,event_value,event_client,event_timestamp) 
VALUES (138,'not-taskUpdated','not-needs approval (dev)',4,'2022-04-05 19:01:25'); -- added

So we can run a query (based on original subquery in question):

SELECT
      t1.event_id
    , t1.event_user
    , t1.event_client
    , t1.event_timestamp
FROM tools_events AS t1
WHERE  (
        t1.event_type = 'taskUpdated'
        AND t1.event_value = 'needs approval (dev)'
        )
    AND t1.event_timestamp >= '2022-04-04 00:00:00'

+----------+------------+--------------+-----------------+
| event_id | event_user | event_client | event_timestamp |
+----------+------------+--------------+-----------------+
|       23 |            |            1 | 2022-04-04      |
|       25 |            |            2 | 2022-04-04      |
|       28 |            |            3 | 2022-04-04      |
|       29 |            |            4 | 2022-04-04      |
|       32 |            |            3 | 2022-04-05      |
|       38 |            |            4 | 2022-04-05      |
+----------+------------+--------------+-----------------+

and get the negative of that query by using NOT (...):

SELECT
      t1.event_id
    , t1.event_user
    , t1.event_client
    , t1.event_timestamp
FROM tools_events AS t1
WHERE  NOT (
        t1.event_type = 'taskUpdated'
        AND t1.event_value = 'needs approval (dev)'
        )
    AND t1.event_timestamp >= '2022-04-04 00:00:00'

+----------+------------+--------------+-----------------+
| event_id | event_user | event_client | event_timestamp |
+----------+------------+--------------+-----------------+
|      138 |            |            4 | 2022-04-05      |
+----------+------------+--------------+-----------------+

Note how event_client 4 exists in both those results, but there is no reference to event_client 5, this is important later

If you are looking for clients that have had an event that is t1.event_type = 'taskUpdated' AND t1.event_value = 'needs approval (dev)' PLUS an event that is NOT(t1.event_type = 'taskUpdated' AND t1.event_value = 'needs approval (dev)' then this will produce that result:

SELECT client_id, client_name, client_developer
FROM `tools_clients` AS clients
WHERE client_id IN (
    SELECT
          t1.event_client
    FROM tools_events AS t1
    WHERE  NOT (
            t1.event_type = 'taskUpdated'
            AND t1.event_value = 'needs approval (dev)'
            )
    AND t1.event_timestamp >= '2022-04-04 00:00:00')


+-----------+-------------+------------------+
| client_id | client_name | client_developer |
+-----------+-------------+------------------+
|         4 | raBoof      |               37 |
+-----------+-------------+------------------+

However if you only want clients that have absolutely no events that meet these criteria t1.event_type = 'taskUpdated' AND t1.event_value = 'needs approval (dev)' then there are 2 options. Using a left join, here we join the rows to DO match the criteria, but then look for those records where the join produces NULL (this is therefore the "negative" of the criteria):

SELECT client_id, client_name, client_developer
FROM `tools_clients` AS clients
left join tools_events AS events ON clients.client_id = events.event_client
        AND events.event_type = 'taskUpdated'
        AND events.event_value = 'needs approval (dev)'
        AND events.event_timestamp >= '2022-04-04 00:00:00'
WHERE client_status > 0
  AND events.event_client IS NULL

or, a NOT EXISTS (also known as a left semi-join);

SELECT client_id, client_name, client_developer
FROM `tools_clients` AS clients
WHERE client_status > 0
  AND NOT EXISTS
    (SELECT NULL
      FROM tools_events AS events
      WHERE event_type = 'taskUpdated'
        AND event_value = 'needs approval (dev)'
        AND event_timestamp >= '2022-04-04 00:00:00'
        AND clients.client_id = events.event_client
      )

both of those queries result is:

+-----------+----------------+------------------+
| client_id |  client_name   | client_developer |
+-----------+----------------+------------------+
|         5 | 2answer-or-not |               67 |
+-----------+----------------+------------------+

Note when using EXISTS or NOT EXISTS the select clause of the subquery does not have to "return" anything so you can use select null or select 1 or select *. This is because EXISTS is just testing if the criteria of the from/where clause exist or not. Also, because that select clause does not really "return" anything, it is NOT necessary to use GROUP BY (or select distinct) in the NOT EXISTS subquery. It is also less efficient to include that GROUP BY in this form of subquery as it just wasted effort.

nb: This is very different to using IN() where the data IS "returned" by the select clause.

So hopefully you can see that there are various ways to negate your conditions - but how you negate them can produce different overall results. To see all of these queries running use: db<>fiddle here

梦年海沫深 2025-01-26 17:47:50

A simple way to negate a set of where clause predicates is to use NOT(...)

SELECT
      t1.event_id
    , t1.event_user
    , t1.event_client
    , t1.event_timestamp
FROM tools_events AS t1
WHERE NOT (
        t1.event_type = 'taskUpdated'
        AND t1.event_value = 'needs approval (dev)'
        )
    AND t1.event_timestamp >= '2022-04-04 00:00:00'
ORDER BY t1.event_timestamp DESC LIMIT 1

and your existing query could be simplified too:

SELECT
      t1.event_id
    , t1.event_user
    , t1.event_client
    , t1.event_timestamp
FROM tools_events AS t1
WHERE t1.event_type = 'taskUpdated'
    AND t1.event_value = 'needs approval (dev)'
    AND t1.event_timestamp >= '2022-04-04 00:00:00'
ORDER BY t1.event_timestamp DESC LIMIT 1

You do not specify which database or version, but if your db/ version support window functions such as row_number() then you might be able to get both records in one query like this:

SELECT
      event_id
    , event_user
    , event_client
    , event_timestamp
FROM (
    SELECT 
          event_id
        , event_user
        , event_client
        , event_timestamp
        , row_number() OVER (
            PARTITION BY CASE WHEN event_type = 'taskUpdated'
                    AND event_value = 'needs approval (dev)' THEN 1 ELSE 0 END
            ORDER BY event_timestamp DESC
            ) AS rn
    FROM tools_events
    WHERE event_timestamp >= '2022-04-04 00:00:00'
    ) AS d
WHERE rn = 1

this divdes the data into 2 "partitions" and for each of those the row with the latest event timestamp will get a row number 1。因此,两个想要的行都在一个查询中返回。

A simple way to negate a set of where clause predicates is to use NOT(...)

SELECT
      t1.event_id
    , t1.event_user
    , t1.event_client
    , t1.event_timestamp
FROM tools_events AS t1
WHERE NOT (
        t1.event_type = 'taskUpdated'
        AND t1.event_value = 'needs approval (dev)'
        )
    AND t1.event_timestamp >= '2022-04-04 00:00:00'
ORDER BY t1.event_timestamp DESC LIMIT 1

and your existing query could be simplified too:

SELECT
      t1.event_id
    , t1.event_user
    , t1.event_client
    , t1.event_timestamp
FROM tools_events AS t1
WHERE t1.event_type = 'taskUpdated'
    AND t1.event_value = 'needs approval (dev)'
    AND t1.event_timestamp >= '2022-04-04 00:00:00'
ORDER BY t1.event_timestamp DESC LIMIT 1

You do not specify which database or version, but if your db/version support window functions such as row_number() then you might be able to get both records in one query like this:

SELECT
      event_id
    , event_user
    , event_client
    , event_timestamp
FROM (
    SELECT 
          event_id
        , event_user
        , event_client
        , event_timestamp
        , row_number() OVER (
            PARTITION BY CASE WHEN event_type = 'taskUpdated'
                    AND event_value = 'needs approval (dev)' THEN 1 ELSE 0 END
            ORDER BY event_timestamp DESC
            ) AS rn
    FROM tools_events
    WHERE event_timestamp >= '2022-04-04 00:00:00'
    ) AS d
WHERE rn = 1

this divdes the data into 2 "partitions" and for each of those the row with the latest event timestamp will get a row number of 1. So both wanted rows are returned in one query.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文