SQL 查询帮助进行 GROUPing 和嵌套查询
给定两个表,
表
A
包含customerid
、lastchange
、internallink
Table
B
包含内部链接
,营业额
(I我在这里简单地将其更改为通用示例,目前实际的结构更为复杂,SQL 方言是 mySQL。)
唯一独特的东西(每个表)是内部链接。 A 中有多条记录具有相同的 customerID、lastchange 的不同日期以及不同的内部链接值。 还有其他与此相关的项目;我无法更改表格。
我需要来自 A 的记录 ID,该记录 ID 是客户的最新记录 ID(具有相同客户 ID 的所有记录中的最高最后更改值)和,B 中匹配特定值条件的条目链接到该记录 ID .
我认为这
SELECT `internallink` FROM `B` WHERE (`turnover` > 10000)
部分不是问题。
我到目前为止:
SELECT `customerID`, MAX(`lastchange`)
FROM `A`
WHERE `lastchange` IN (SELECT `internallink` FROM `B`
WHERE `turnover` > 10000)
GROUP BY `customerID`;
唉,该语句给出了错误的结果,因为上面将返回给我的客户 ID,其中最新的值不满足条件,但一些较旧的值满足条件 - 它选择最旧的值,并返回它。但如果最近的条目低于阈值,则 customerID 根本不应该出现。
我哪里出错了,正确的方法是什么?
样本数据 表A
customerid lastchange internallink 3 2010-02-11 11 3 2010-09-04 12 3 2010-10-22 13 3 2010-11-23 14 4 2010-05-05 15 4 2010-12-01 16 5 2010-11-28 17 5 2010-11-29 18
表B
internallink turnover 11 47000 12 11000 13 8000 14 15000 15 17000 16 23000 17 50000 18 10000
我测试的实际阈值是12000。 您可以看到 customerID 不应出现在结果集中,因为最近的条目低于阈值。
结果集应该是 (3,2010-11-23)(4,2010-12-01) - 但目前它还包含 (5,2010-11-28),这是错误的。
更接近一点(在您的帮助下,谢谢!),这两个语句都有效:
SELECT customerID, MAX(lastchange), externallink FROM A GROUP BY customerID; SELECT 内部链接 FROM B WHERE (成交额 > 12000);
现在我需要的是两者的交集......具有正确的逻辑!
Given are two tables,
Table
A
containingcustomerid
,lastchange
,internallink
Table
B
containinginternallink
,turnover
(I'm simplyfying here and changing it to a generic example, the actual structure is more complex. SQL dialect for now is mySQL.)
The only unique thing (per table) is the internallink.
There are several records in A with the same customerID, different dates in lastchange, and different internallink values.
There are other items linked with this; I cannot change the tables.
I need the recordIDs from A which are the most recent for a customer (highest lastchange value of all with the same customerID) and to which the entries in B which match a certain value condition are linked.
I think the
SELECT `internallink` FROM `B` WHERE (`turnover` > 10000)
part is not the issue.
I got this far:
SELECT `customerID`, MAX(`lastchange`)
FROM `A`
WHERE `lastchange` IN (SELECT `internallink` FROM `B`
WHERE `turnover` > 10000)
GROUP BY `customerID`;
Alas, that statement gives wrong results, because above will return me customerIDs for which the most recent value does not fulfill the criteria, but some older did - it selects the oldest that did, and returns this. But in case the most recent entry is below threshold, the customerID should not turn up at all.
Where did I go wrong, and what is the correct approach to this?
Sample Data
Table A
customerid lastchange internallink 3 2010-02-11 11 3 2010-09-04 12 3 2010-10-22 13 3 2010-11-23 14 4 2010-05-05 15 4 2010-12-01 16 5 2010-11-28 17 5 2010-11-29 18
Table B
internallink turnover 11 47000 12 11000 13 8000 14 15000 15 17000 16 23000 17 50000 18 10000
The actual threshold in my tests is 12000.
You can see customerID should not be in the result set, since the most recent entry is below the threshold.
The result set should be (3,2010-11-23)(4,2010-12-01) - but currently it also contains (5,2010-11-28), which is wrong.
Getting a bit closer (with your help, thanks!), these two statements both work:
SELECT customerID, MAX(lastchange), internallink FROM A GROUP BY customerID;
SELECT internallink FROM B WHERE (turnover > 12000);
Now all I need is the intersection of both... with the correct logic!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
发布评论
评论(3)
这适用于 sql server - 我不确定 mySql 是否有类似的排名功能。
select a.id, a.lastchange, b.turnover, a.rownumber from B b inner join
(SELECT id, lastchange, internallink, ROW_NUMBER() OVER(PARTITION BY id ORDER BY lastchange DESC) AS 'rownumber'
FROM A) a on b.internallink = a.internallink
where a.rownumber = 1 and b.turnover > 5000
“ROW_NUMBER() OVER(PARTITION BY id ORDER BY lastchange DESC) AS 'rownumber'”的意思是...
我想将所有相同的 id 分组在一起,并在计算每行之后按 lastchange by desc 对它们进行排序。哦,并将该列命名为 rownumber。
id lastchange internallink rownumber
1 2010-01-03 2 1
1 2010-01-02 1 2
1 2010-01-01 1 3
2 2010-01-04 2 1
选择任何 rownumber 为 1 的记录将返回 id 的最后修改记录。
经过大量测试和一些研究后,我找到了这个解决方案,并发布了
以防其他人遇到类似的问题。
额外的表“缓存”保留表 A 中最新条目的副本,从而大大降低了复杂性。它通过使用如下触发器保持最新:
CREATE TRIGGER sync_a_insert AFTER INSERT ON a FOR EACH ROW INSERT INTO cache (`customerID`, `internallink`) VALUES (NEW.`customerID`,NEW.`internallink`); CREATE TRIGGER sync_a_update AFTER UPDATE ON a FOR EACH ROW UPDATE cache SET `internallink` = NEW.`internallink` WHERE (`customerID` = NEW.`customerID`); CREATE TRIGGER sync_a_delete BEFORE DELETE ON a FOR EACH ROW DELETE FROM cache WHERE `customerID` = OLD.`customerID`;
对于 INSERT 和 UPDATE,这些触发器在事后触发,因此表 a 中的条目在缓存更新之前已完成。对于 DELETE,需要在原始条目消失之前更新缓存。
一旦到位,其他一切都变得简单:
SELECT `customerID` FROM cache WHERE `internallink` IN (SELECT `internallink` FROM b WHERE (`turnover` > 10000));
对我来说,这是一个可行的解决方案,它甚至可以加快查找速度。当然,数据库大小是有代价的,但我认为总体性能要好得多——只要读访问比写访问至少多一次,就有改进。
不过,您给出的答案对我很有帮助。我从他们那里学到了很多东西,并从尝试遵循你的建议中学到了很多东西(甚至已经将其中一些用于其他地方)。感谢所有回答我问题的人!
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
以下查询应该执行您想要的操作。它不是编写此类查询的最高效的方式。但它使用标准 SQL 并且可以在任何数据库中执行。
工作原理如下:内部子查询查找所有 customerid 以及最新的更改。对于每个这样的对(customerid、lastchange),我们在表 A 中找到原始行。在表 A 中找到行后,我们使用内部链接在 B 中查找匹配记录,但前提是关联的营业额大于 10000。
The following query should do what you want. It is not the most performant way to write this kind of query. But it's using standard SQL and it executes in any database.
The works like this: The inner subquery finds all customerids along with the latest changedate. For each such pair (customerid, lastchange), we find the original row in table A. Having found a row in table A, we use the internallink to find a matching record in B, but only if the associated turnover is greater than 10000.