SQL 查询帮助进行 GROUPing 和嵌套查询

发布于 2024-10-05 09:15:29 字数 1861 浏览 1 评论 0原文

给定两个表,

  • A 包含 customeridlastchangeinternallink

  • Table B 包含 内部链接营业额

(I我在这里简单地将其更改为通用示例,目前实际的结构更为复杂,SQL 方言是 mySQL。)

唯一独特的东西(每个表)是内部链接。 A 中有多条记录具有相同的 customerID、lastchange 的不同日期以及不同的内部链接值。 还有其他与此相关的项目;我无法更改表格。

我需要来自 A 的记录 ID,该记录 ID 是客户的最新记录 ID(具有相同客户 ID 的所有记录中的最高最后更改值),B 中匹配特定值条件的条目链接到该记录 ID .

我认为这

SELECT `internallink` FROM `B` WHERE (`turnover` > 10000) 

部分不是问题。

我到目前为止:

SELECT `customerID`, MAX(`lastchange`)
  FROM `A` 
 WHERE `lastchange` IN (SELECT `internallink` FROM `B` 
                         WHERE `turnover` > 10000)
 GROUP BY `customerID`;

唉,该语句给出了错误的结果,因为上面将返回给我的客户 ID,其中最新的值不满足条件,但一些较旧的值满足条件 - 它选择最旧的值,并返回它。但如果最近的条目低于阈值,则 customerID 根本不应该出现。

我哪里出错了,正确的方法是什么?

样本数据 表A

customerid   lastchange   internallink
         3   2010-02-11   11
         3   2010-09-04   12
         3   2010-10-22   13
         3   2010-11-23   14
         4   2010-05-05   15
         4   2010-12-01   16
         5   2010-11-28   17
         5   2010-11-29   18

表B

internallink  turnover
          11     47000
          12     11000
          13      8000
          14     15000
          15     17000
          16     23000
          17     50000
          18     10000

我测试的实际阈值是12000。 您可以看到 customerID 不应出现在结果集中,因为最近的条目低于阈值。

结果集应该是 (3,2010-11-23)(4,2010-12-01) - 但目前它还包含 (5,2010-11-28),这是错误的。


更接近一点(在您的帮助下,谢谢!),这两个语句都有效:

SELECT customerID, MAX(lastchange), externallink FROM A GROUP BY customerID; SELECT 内部链接 FROM B WHERE (成交额 > 12000);

现在我需要的是两者的交集......具有正确的逻辑!

Given are two tables,

  • Table A containing customerid, lastchange, internallink

  • Table B containing internallink, turnover

(I'm simplyfying here and changing it to a generic example, the actual structure is more complex. SQL dialect for now is mySQL.)

The only unique thing (per table) is the internallink.
There are several records in A with the same customerID, different dates in lastchange, and different internallink values.
There are other items linked with this; I cannot change the tables.

I need the recordIDs from A which are the most recent for a customer (highest lastchange value of all with the same customerID) and to which the entries in B which match a certain value condition are linked.

I think the

SELECT `internallink` FROM `B` WHERE (`turnover` > 10000) 

part is not the issue.

I got this far:

SELECT `customerID`, MAX(`lastchange`)
  FROM `A` 
 WHERE `lastchange` IN (SELECT `internallink` FROM `B` 
                         WHERE `turnover` > 10000)
 GROUP BY `customerID`;

Alas, that statement gives wrong results, because above will return me customerIDs for which the most recent value does not fulfill the criteria, but some older did - it selects the oldest that did, and returns this. But in case the most recent entry is below threshold, the customerID should not turn up at all.

Where did I go wrong, and what is the correct approach to this?

Sample Data
Table A

customerid   lastchange   internallink
         3   2010-02-11   11
         3   2010-09-04   12
         3   2010-10-22   13
         3   2010-11-23   14
         4   2010-05-05   15
         4   2010-12-01   16
         5   2010-11-28   17
         5   2010-11-29   18

Table B

internallink  turnover
          11     47000
          12     11000
          13      8000
          14     15000
          15     17000
          16     23000
          17     50000
          18     10000

The actual threshold in my tests is 12000.
You can see customerID should not be in the result set, since the most recent entry is below the threshold.

The result set should be (3,2010-11-23)(4,2010-12-01) - but currently it also contains (5,2010-11-28), which is wrong.


Getting a bit closer (with your help, thanks!), these two statements both work:

SELECT customerID, MAX(lastchange), internallink FROM A GROUP BY customerID;
SELECT internallink FROM B WHERE (turnover > 12000);

Now all I need is the intersection of both... with the correct logic!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

很糊涂小朋友 2024-10-12 09:15:29

以下查询应该执行您想要的操作。它不是编写此类查询的最高效的方式。但它使用标准 SQL 并且可以在任何数据库中执行。

工作原理如下:内部子查询查找所有 customerid 以及最新的更改。对于每个这样的对(customerid、lastchange),我们在表 A 中找到原始行。在表 A 中找到行后,我们使用内部链接在 B 中查找匹配记录,但前提是关联的营业额大于 10000。

drop table a;
drop table b;

create table a(
   customerid   int  not null
  ,lastchange   date not null
  ,internallink int  not null
);

create table b(
   internallink int not null
  ,turnover     int not null
);

insert into a values(3, date '2010-02-11', 11);
insert into a values(3, date '2010-09-04', 12);
insert into a values(3, date '2010-10-22', 13);
insert into a values(3, date '2010-11-23', 14);
insert into a values(4, date '2010-05-05', 15);
insert into a values(4, date '2010-12-01', 16);
insert into a values(5, date '2010-11-28', 17);
insert into a values(5, date '2010-11-29', 18);

insert into b values(11, 47000);
insert into b values(12, 11000);
insert into b values(13,  8000);
insert into b values(14, 15000);
insert into b values(15, 17000);
insert into b values(16, 23000);
insert into b values(17, 50000);
insert into b values(18, 10000);

select a.customerid
      ,a.lastchange
      ,a.internallink
      ,b.turnover
  from a
  join b on (a.internallink = b.internallink)
 where b.turnover > 10000
   and (a.customerid, a.lastchange) in(select customerid,max(lastchange)
                                         from a
                                     group by customerid);

The following query should do what you want. It is not the most performant way to write this kind of query. But it's using standard SQL and it executes in any database.

The works like this: The inner subquery finds all customerids along with the latest changedate. For each such pair (customerid, lastchange), we find the original row in table A. Having found a row in table A, we use the internallink to find a matching record in B, but only if the associated turnover is greater than 10000.

drop table a;
drop table b;

create table a(
   customerid   int  not null
  ,lastchange   date not null
  ,internallink int  not null
);

create table b(
   internallink int not null
  ,turnover     int not null
);

insert into a values(3, date '2010-02-11', 11);
insert into a values(3, date '2010-09-04', 12);
insert into a values(3, date '2010-10-22', 13);
insert into a values(3, date '2010-11-23', 14);
insert into a values(4, date '2010-05-05', 15);
insert into a values(4, date '2010-12-01', 16);
insert into a values(5, date '2010-11-28', 17);
insert into a values(5, date '2010-11-29', 18);

insert into b values(11, 47000);
insert into b values(12, 11000);
insert into b values(13,  8000);
insert into b values(14, 15000);
insert into b values(15, 17000);
insert into b values(16, 23000);
insert into b values(17, 50000);
insert into b values(18, 10000);

select a.customerid
      ,a.lastchange
      ,a.internallink
      ,b.turnover
  from a
  join b on (a.internallink = b.internallink)
 where b.turnover > 10000
   and (a.customerid, a.lastchange) in(select customerid,max(lastchange)
                                         from a
                                     group by customerid);
迷爱 2024-10-12 09:15:29

这适用于 sql server - 我不确定 mySql 是否有类似的排名功能。

select a.id, a.lastchange, b.turnover, a.rownumber from B b inner join 
(SELECT id, lastchange, internallink, ROW_NUMBER() OVER(PARTITION BY id ORDER BY lastchange DESC) AS 'rownumber'
FROM A) a on b.internallink = a.internallink
where a.rownumber = 1 and b.turnover > 5000

“ROW_NUMBER() OVER(PARTITION BY id ORDER BY lastchange DESC) AS 'rownumber'”的意思是...

我想将所有相同的 id 分组在一起,并在计算每行之后按 lastchange by desc 对它们进行排序。哦,并将该列命名为 rownumber。

id  lastchange    internallink  rownumber
1   2010-01-03    2           1
1   2010-01-02    1           2
1   2010-01-01    1           3
2   2010-01-04    2           1

选择任何 rownumber 为 1 的记录将返回 id 的最后修改记录。

This works in sql server - I'm not sure if mySql has a similar ranking functions.

select a.id, a.lastchange, b.turnover, a.rownumber from B b inner join 
(SELECT id, lastchange, internallink, ROW_NUMBER() OVER(PARTITION BY id ORDER BY lastchange DESC) AS 'rownumber'
FROM A) a on b.internallink = a.internallink
where a.rownumber = 1 and b.turnover > 5000

The "ROW_NUMBER() OVER(PARTITION BY id ORDER BY lastchange DESC) AS 'rownumber'" means...

I want to group all the same ids together and order them by lastchange by desc after that count each row. Oh and name that column rownumber.

id  lastchange    internallink  rownumber
1   2010-01-03    2           1
1   2010-01-02    1           2
1   2010-01-01    1           3
2   2010-01-04    2           1

Selecting any record with a rownumber of 1 will return the last modified record of an id.

你的往事 2024-10-12 09:15:29

经过大量测试和一些研究后,我找到了这个解决方案,并发布了
以防其他人遇到类似的问题。

额外的表“缓存”保留表 A 中最新条目的副本,从而大大降低了复杂性。它通过使用如下触发器保持最新:

CREATE TRIGGER sync_a_insert AFTER INSERT ON a FOR EACH ROW 
    INSERT INTO cache (`customerID`, `internallink`) VALUES (NEW.`customerID`,NEW.`internallink`);
CREATE TRIGGER sync_a_update AFTER UPDATE ON a FOR EACH ROW 
    UPDATE cache SET `internallink` = NEW.`internallink` WHERE (`customerID` = NEW.`customerID`);
CREATE TRIGGER sync_a_delete BEFORE DELETE ON a FOR EACH ROW 
    DELETE FROM cache WHERE `customerID` = OLD.`customerID`;

对于 INSERT 和 UPDATE,这些触发器在事后触发,因此表 a 中的条目在缓存更新之前已完成。对于 DELETE,需要在原始条目消失之前更新缓存。

一旦到位,其他一切都变得简单:

SELECT `customerID` FROM cache WHERE `internallink` IN 
    (SELECT `internallink` FROM b WHERE (`turnover` > 10000));

对我来说,这是一个可行的解决方案,它甚至可以加快查找速度。当然,数据库大小是有代价的,但我认为总体性能要好得多——只要读访问比写访问至少多一次,就有改进。

不过,您给出的答案对我很有帮助。我从他们那里学到了很多东西,并从尝试遵循你的建议中学到了很多东西(甚至已经将其中一些用于其他地方)。感谢所有回答我问题的人!

After lots of testing and some research, I found this solution, and I post
this in case anybody else should face a similar problem.

An additional table "cache" keeps copies of the most recent entries in table A, reducing the complexity a lot. It is kept current by using triggers like this:

CREATE TRIGGER sync_a_insert AFTER INSERT ON a FOR EACH ROW 
    INSERT INTO cache (`customerID`, `internallink`) VALUES (NEW.`customerID`,NEW.`internallink`);
CREATE TRIGGER sync_a_update AFTER UPDATE ON a FOR EACH ROW 
    UPDATE cache SET `internallink` = NEW.`internallink` WHERE (`customerID` = NEW.`customerID`);
CREATE TRIGGER sync_a_delete BEFORE DELETE ON a FOR EACH ROW 
    DELETE FROM cache WHERE `customerID` = OLD.`customerID`;

For INSERT and UPDATE, those triggers go off after the fact, so the entries in Table a are complete before the cache updates. For DELETE, the cache needs to be updated before the original entry vanishes.

Once this is in place, everything else becomes simple:

SELECT `customerID` FROM cache WHERE `internallink` IN 
    (SELECT `internallink` FROM b WHERE (`turnover` > 10000));

For me, this is a viable solution, and it even speeds up the lookups. Of course there is a cost in DB size, but I think the performance is much better overall - as long as there is at least one more read access than write access, there is an improvement.

The answers you gave were, however, very helpful to me. I learned a bunch of things from them and from trying to follow your advice (even putting some of it to use in other places already). Thanks to all who replied to my question!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文