在不使用临时表的情况下选择不同的记录
我有一个第三方表,其中填充了一些杂乱的数据,我需要从中获取最新的不同记录。每年或每次“人”发生变化时,该表都会添加一个新行。该表的工作原理是基于最近的 ActiveDate 是正确的人。我创建了一个模拟表和数据来显示这一点。
CREATE TABLE `Persons` (
`PersonId` varchar(200) NOT NULL,
`Name` varchar(200) NOT NULL DEFAULT '',
`ActiveDate` varchar(25) NOT NULL,
`ExpireDate` varchar(25) DEFAULT NULL,
`Job` varchar(200) NOT NULL DEFAULT '',
`Position` varchar(200) NOT NULL DEFAULT ''
)
还有一些模拟数据:
Id |`Name` |ActiveDate |ExpireDate |Job |`Position`
---------------------------------------------------------------------------------------------------
J1234 |Doe, John |2010-08-15 00:00:00 |2011-08-15 00:00:00 |Worker |Janitor
J1234 |Doe, John |2011-08-15 00:00:00 |0000-00-00 00:00:00 |Worker |Janitor
777 |Doe, Jane |2010-06-04 00:00:00 |0000-00-00 00:00:00 |Boss |Janitor
777 |Doe, Jane |2011-04-30 00:00:00 |0000-00-00 00:00:00 |Boss |Janitor
654G |Smith, Jane |2011-01-20 00:00:00 |0000-00-00 00:00:00 |Worker |Janitor
该表还具有 ExpireDate 列,该列实际上是由最终用户设置的,并且并不总是设置得太多,令我沮丧。目前,我正在使用虚拟表将不同的记录拉出并存储当天。我会使用临时表,但我不是 100% 确定如何在 MySQL 中使用,而且我不喜欢它们。我这样做只是暂时的,希望获得更好的 SQL。
然后,数据必须与许多其他表连接起来才能得到成品。但我仍然需要处理初始的一组不同数据。并且从一开始就加入另一个表是行不通的。
因此,这就是我如何提取数据、存储数据,然后稍后再次提取数据并将其连接到其他表:
INSERT INTO tmp_Person (Id, `Name`, Job, `Position`)
SELECT DISTINCT Id, `Name`, Job, `Position`
FROM Person
SELECT tmp_Person.Id,
tmp_Person.`Name`,
tmp_Person.Job,
tmp_Person.`Position`,
Pricing.Cost,
Pricing.Benefit
FROM tmp_Person
LEFT OUTER JOIN Pricing AS CL ON CL.PersonId = tmp_Person.Id
AND CL.PriceScredule = 'Major-Client'
AND CL.ExpireDate = '0000-00-00 00:00:00'
LEFT OUTER JOIN Pricing AS Inter ON Inter.PersonId = tmp_Person.Id
AND Inter.PriceScredule = 'Internal-Client'
AND Inter.ExpireDate = '0000-00-00 00:00:00'
我如何编写此代码以避免使用临时表(以任何形式)处理重复行的成本)?希望我已经说得足够清楚了,如果没有,我可以很乐意补充或澄清。
I have a third party table that is being populated with some cluttered data that I'm needing to get the most recent distinct records out of. The table will be fed a new row every year, or every time the "Person" changes. The table works based on that the most recent ActiveDate is the correct person. I've created a mock table and data to show this.
CREATE TABLE `Persons` (
`PersonId` varchar(200) NOT NULL,
`Name` varchar(200) NOT NULL DEFAULT '',
`ActiveDate` varchar(25) NOT NULL,
`ExpireDate` varchar(25) DEFAULT NULL,
`Job` varchar(200) NOT NULL DEFAULT '',
`Position` varchar(200) NOT NULL DEFAULT ''
)
And some mock data:
Id |`Name` |ActiveDate |ExpireDate |Job |`Position`
---------------------------------------------------------------------------------------------------
J1234 |Doe, John |2010-08-15 00:00:00 |2011-08-15 00:00:00 |Worker |Janitor
J1234 |Doe, John |2011-08-15 00:00:00 |0000-00-00 00:00:00 |Worker |Janitor
777 |Doe, Jane |2010-06-04 00:00:00 |0000-00-00 00:00:00 |Boss |Janitor
777 |Doe, Jane |2011-04-30 00:00:00 |0000-00-00 00:00:00 |Boss |Janitor
654G |Smith, Jane |2011-01-20 00:00:00 |0000-00-00 00:00:00 |Worker |Janitor
The table also has and ExpireDate column which is actually set by the end user, and is not always set much to my dismay. Currently I'm using a dummy table to pull the distinct records out into and store for the day. I would use a temporary table but I'm not 100% sure how to in MySQL, plus I dislike them. The way I'm doing it is just temporary in hope for better SQL.
The data then has to be joined with a multitude of other tables to get the finished product. But I'm still needing to deal with the initial set of distinct data. And joining in the other table right from the start just wont work.
So here is how I'm pulling my data, storing it, and then pulling it again later and joing it to other tables:
INSERT INTO tmp_Person (Id, `Name`, Job, `Position`)
SELECT DISTINCT Id, `Name`, Job, `Position`
FROM Person
SELECT tmp_Person.Id,
tmp_Person.`Name`,
tmp_Person.Job,
tmp_Person.`Position`,
Pricing.Cost,
Pricing.Benefit
FROM tmp_Person
LEFT OUTER JOIN Pricing AS CL ON CL.PersonId = tmp_Person.Id
AND CL.PriceScredule = 'Major-Client'
AND CL.ExpireDate = '0000-00-00 00:00:00'
LEFT OUTER JOIN Pricing AS Inter ON Inter.PersonId = tmp_Person.Id
AND Inter.PriceScredule = 'Internal-Client'
AND Inter.ExpireDate = '0000-00-00 00:00:00'
How can I write this to avoid the cost of processing out the duplicate rows using a temporary table (in any form)? HOpefully I've made this clear enough, if not I can gladly add, or clarify.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
将
tmp_Person
替换为临时表的代码:正如 @Andriy 发现的那样,在 SELECT 列表中使用
Pricing.Cost
或Pricing.Benefit
会引发错误。我猜你发帖的时候忘记修改了。Replace
tmp_Person
with the code you have for the temp table:As @Andriy spotted, using
Pricing.Cost
orPricing.Benefit
in the SELECT list would raise error. I guess you forgot to change it when you posted.在我意识到问题是针对 mysql 之前将其放在一起,但原理应该是相同的,这将为您提供每个 PersonID 的记录以及 Person 表中最新的 ActiveDate 。
Put this together before I realised the question was for mysql but the principal should be the same, this will get you the record for each PersonID with the most recent ActiveDate from Person table.