在不使用临时表的情况下选择不同的记录

发布于 2024-11-30 00:15:18 字数 2105 浏览 2 评论 0原文

我有一个第三方表,其中填充了一些杂乱的数据,我需要从中获取最新的不同记录。每年或每次“人”发生变化时,该表都会添加一个新行。该表的工作原理是基于最近的 ActiveDate 是正确的人。我创建了一个模拟表和数据来显示这一点。

CREATE TABLE `Persons` (
  `PersonId` varchar(200) NOT NULL,
  `Name` varchar(200) NOT NULL DEFAULT '',
  `ActiveDate` varchar(25) NOT NULL,
  `ExpireDate` varchar(25) DEFAULT NULL,
  `Job` varchar(200) NOT NULL DEFAULT '',
  `Position` varchar(200) NOT NULL DEFAULT ''
)

还有一些模拟数据:

Id       |`Name`        |ActiveDate              |ExpireDate             |Job       |`Position`
---------------------------------------------------------------------------------------------------
J1234    |Doe, John     |2010-08-15 00:00:00     |2011-08-15 00:00:00    |Worker    |Janitor
J1234    |Doe, John     |2011-08-15 00:00:00     |0000-00-00 00:00:00    |Worker    |Janitor
777      |Doe, Jane     |2010-06-04 00:00:00     |0000-00-00 00:00:00    |Boss      |Janitor
777      |Doe, Jane     |2011-04-30 00:00:00     |0000-00-00 00:00:00    |Boss      |Janitor
654G     |Smith, Jane   |2011-01-20 00:00:00     |0000-00-00 00:00:00    |Worker    |Janitor

该表还具有 ExpireDate 列,该列实际上是由最终用户设置的,并且并不总是设置得太多,令我沮丧。目前,我正在使用虚拟表将不同的记录拉出并存储当天。我会使用临时表,但我不是 100% 确定如何在 MySQL 中使用,而且我不喜欢它们。我这样做只是暂时的,希望获得更好的 SQL。

然后,数据必须与许多其他表连接起来才能得到成品。但我仍然需要处理初始的一组不同数据。并且从一开始就加入另一个表是行不通的。

因此,这就是我如何提取数据、存储数据,然后稍后再次提取数据并将其连接到其他表:

INSERT INTO tmp_Person (Id, `Name`, Job, `Position`) 
    SELECT DISTINCT Id, `Name`, Job, `Position`
    FROM Person 

SELECT  tmp_Person.Id, 
    tmp_Person.`Name`, 
    tmp_Person.Job, 
    tmp_Person.`Position`,
    Pricing.Cost, 
    Pricing.Benefit

    FROM tmp_Person
    LEFT OUTER JOIN Pricing AS CL ON CL.PersonId = tmp_Person.Id 
        AND CL.PriceScredule = 'Major-Client' 
        AND CL.ExpireDate = '0000-00-00 00:00:00'
    LEFT OUTER JOIN Pricing AS Inter ON Inter.PersonId = tmp_Person.Id 
        AND Inter.PriceScredule = 'Internal-Client' 
        AND Inter.ExpireDate = '0000-00-00 00:00:00'

我如何编写此代码以避免使用临时表(以任何形式)处理重复行的成本)?希望我已经说得足够清楚了,如果没有,我可以很乐意补充或澄清。

I have a third party table that is being populated with some cluttered data that I'm needing to get the most recent distinct records out of. The table will be fed a new row every year, or every time the "Person" changes. The table works based on that the most recent ActiveDate is the correct person. I've created a mock table and data to show this.

CREATE TABLE `Persons` (
  `PersonId` varchar(200) NOT NULL,
  `Name` varchar(200) NOT NULL DEFAULT '',
  `ActiveDate` varchar(25) NOT NULL,
  `ExpireDate` varchar(25) DEFAULT NULL,
  `Job` varchar(200) NOT NULL DEFAULT '',
  `Position` varchar(200) NOT NULL DEFAULT ''
)

And some mock data:

Id       |`Name`        |ActiveDate              |ExpireDate             |Job       |`Position`
---------------------------------------------------------------------------------------------------
J1234    |Doe, John     |2010-08-15 00:00:00     |2011-08-15 00:00:00    |Worker    |Janitor
J1234    |Doe, John     |2011-08-15 00:00:00     |0000-00-00 00:00:00    |Worker    |Janitor
777      |Doe, Jane     |2010-06-04 00:00:00     |0000-00-00 00:00:00    |Boss      |Janitor
777      |Doe, Jane     |2011-04-30 00:00:00     |0000-00-00 00:00:00    |Boss      |Janitor
654G     |Smith, Jane   |2011-01-20 00:00:00     |0000-00-00 00:00:00    |Worker    |Janitor

The table also has and ExpireDate column which is actually set by the end user, and is not always set much to my dismay. Currently I'm using a dummy table to pull the distinct records out into and store for the day. I would use a temporary table but I'm not 100% sure how to in MySQL, plus I dislike them. The way I'm doing it is just temporary in hope for better SQL.

The data then has to be joined with a multitude of other tables to get the finished product. But I'm still needing to deal with the initial set of distinct data. And joining in the other table right from the start just wont work.

So here is how I'm pulling my data, storing it, and then pulling it again later and joing it to other tables:

INSERT INTO tmp_Person (Id, `Name`, Job, `Position`) 
    SELECT DISTINCT Id, `Name`, Job, `Position`
    FROM Person 

SELECT  tmp_Person.Id, 
    tmp_Person.`Name`, 
    tmp_Person.Job, 
    tmp_Person.`Position`,
    Pricing.Cost, 
    Pricing.Benefit

    FROM tmp_Person
    LEFT OUTER JOIN Pricing AS CL ON CL.PersonId = tmp_Person.Id 
        AND CL.PriceScredule = 'Major-Client' 
        AND CL.ExpireDate = '0000-00-00 00:00:00'
    LEFT OUTER JOIN Pricing AS Inter ON Inter.PersonId = tmp_Person.Id 
        AND Inter.PriceScredule = 'Internal-Client' 
        AND Inter.ExpireDate = '0000-00-00 00:00:00'

How can I write this to avoid the cost of processing out the duplicate rows using a temporary table (in any form)? HOpefully I've made this clear enough, if not I can gladly add, or clarify.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

惯饮孤独 2024-12-07 00:15:19

tmp_Person 替换为临时表的代码:

SELECT  tmp_Person.Id, 
    tmp_Person.`Name`, 
    tmp_Person.Job, 
    tmp_Person.`Position`,
    CL.Cost     AS MajorCost,              
    CL.Benefit  AS MajorBenefit,   
    Inter.Cost    AS InternalCost,
    Inter.Benefit AS InternalBenefit

    FROM 
      ( SELECT DISTINCT Id, `Name`, Job, `Position`
        FROM Person 
      )
      AS tmp_Person
    LEFT OUTER JOIN Pricing AS CL ON CL.PersonId = tmp_Person.Id 
        AND CL.PriceScredule = 'Major-Client' 
        AND CL.ExpireDate = '0000-00-00 00:00:00'
    LEFT OUTER JOIN Pricing AS Inter ON Inter.PersonId = tmp_Person.Id 
        AND Inter.PriceScredule = 'Internal-Client' 
        AND Inter.ExpireDate = '0000-00-00 00:00:00'

正如 @Andriy 发现的那样,在 SELECT 列表中使用 Pricing.CostPricing.Benefit会引发错误。我猜你发帖的时候忘记修改了。

Replace tmp_Person with the code you have for the temp table:

SELECT  tmp_Person.Id, 
    tmp_Person.`Name`, 
    tmp_Person.Job, 
    tmp_Person.`Position`,
    CL.Cost     AS MajorCost,              
    CL.Benefit  AS MajorBenefit,   
    Inter.Cost    AS InternalCost,
    Inter.Benefit AS InternalBenefit

    FROM 
      ( SELECT DISTINCT Id, `Name`, Job, `Position`
        FROM Person 
      )
      AS tmp_Person
    LEFT OUTER JOIN Pricing AS CL ON CL.PersonId = tmp_Person.Id 
        AND CL.PriceScredule = 'Major-Client' 
        AND CL.ExpireDate = '0000-00-00 00:00:00'
    LEFT OUTER JOIN Pricing AS Inter ON Inter.PersonId = tmp_Person.Id 
        AND Inter.PriceScredule = 'Internal-Client' 
        AND Inter.ExpireDate = '0000-00-00 00:00:00'

As @Andriy spotted, using Pricing.Cost or Pricing.Benefit in the SELECT list would raise error. I guess you forgot to change it when you posted.

榕城若虚 2024-12-07 00:15:19

在我意识到问题是针对 mysql 之前将其放在一起,但原理应该是相同的,这将为您提供每个 PersonID 的记录以及 Person 表中最新的 ActiveDate 。

select *
from
(
 select persons.*, ROW_NUMBER() over(partition by personid order by personid, activedate desc) as rn 
 from persons
) basedata
where basedata.rn=1

Put this together before I realised the question was for mysql but the principal should be the same, this will get you the record for each PersonID with the most recent ActiveDate from Person table.

select *
from
(
 select persons.*, ROW_NUMBER() over(partition by personid order by personid, activedate desc) as rn 
 from persons
) basedata
where basedata.rn=1
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文