从非规范化表中选择最新记录

发布于 2024-12-13 06:30:53 字数 866 浏览 0 评论 0原文

我设置了一个包含以下列的表：

fname
lname
地址
city
state
zip
客户 ID
date_modified

数据基本上是非规范化的，因此看起来

Ben -- Smith--***123first*** st -- NY -- NY -- 12101 -- 123 --  1-1-2011
Ben -- Smith--***123 1st st*** -- NY -- NY -- 12101 -- 123 -- 1-1-2011
Sara -- Smith -- BLAH BLAH BLAh

我正在尝试将这些记录复制到新表中，但我只想要每个客户 ID 一条记录。

我尝试做类似的事情，

 insert into new_table(fname,lname,address,city,state,zip,Customer_ID,
                      date_modified)

 select fname,lname,address,city,state,zip,Customer_ID,date_modified
 group by (fname,lname,address,city,state,zip,Customer_ID,date_modified)

问题是有太多地址和其他列具有相同的含义但不同的文本（第一个与第一个）。因此分组依据将这两条记录保留在新表中。我如何通过选择最大值（修改日期）来为每个客户 ID 获取一条记录。基本上我只想按 customer_id 进行分组，而不是按其余列进行分组，但这在 Oracle 中是不允许的。

原文

I have a table set up with columns:

fname
lname
address
city
state
zip
Customer ID
date_modified

the data is basically denormalized so looks like

Ben -- Smith--***123first*** st -- NY -- NY -- 12101 -- 123 --  1-1-2011
Ben -- Smith--***123 1st st*** -- NY -- NY -- 12101 -- 123 -- 1-1-2011
Sara -- Smith -- BLAH BLAH BLAh

I am trying to copy these records to a new table but I only want one record per Customer ID.

I tried doing something like

 insert into new_table(fname,lname,address,city,state,zip,Customer_ID,
                      date_modified)

 select fname,lname,address,city,state,zip,Customer_ID,date_modified
 group by (fname,lname,address,city,state,zip,Customer_ID,date_modified)

the issue is there are too many addresses and other columns that have the same meaning but different text (first vs 1st). so the group by leaves both of these records in the new table. how do i esentialy get one record for each customer id by choosing the max(date modified). Basically I would want to group by just the customer_id and not the rest of the columns but that isn't allowed in oracle.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

葬心 2024-12-20 06:30:53

以下是获得您想要的东西的几种方法之一。请记住，这不会标准化您的数据库。您仍然有一个包含名字、姓氏和地址的 customer_id。我可能会将其转换为多个插入，一个用于获取所有唯一的客户 ID 以及“客户”表的最新名称信息，然后另一个用于获取地址的插入。如果您想要更改的历史信息，那么您需要进行适当的调整。

此外，如果您有两行具有完全相同的客户 ID 和相同的确切date_modified，则下面的代码将无法正常工作。如果您遇到这种情况，您将需要想出适当的业务逻辑来处理该情况。

INSERT INTO New_Table (
    fname,
    lname,
    address,
    city,
    state,
    zip,
    Customer_ID,
    date_modified )
SELECT
    OT1.fname,
    OT1.lname,
    OT1.address,
    OT1.city,
    OT1.state,
    OT1.zip,
    OT1.customer_id,
    OT1.date_modified
FROM
    (
    SELECT
        customer_id,
        MAX(date_modified) AS latest_date_modified
    FROM
        Old_Table
    GROUP BY customer_id) SQ
INNER JOIN Old_Table OT1 ON
    OT1.customer_id = OT1.customer_id AND
    OT1.date_modified = SQ.latest_date_modified

The following is one of several ways to get what you seem to want. Keep in mind though that this does not normalize your database. You still have a customer_id in with a first and last name and an address. I'd probably turn this into several inserts, one to get all of the unique customer IDs along with the latest name information for the Customers table, then another insert for the addresses. If you want historical information of changes then you would need to adjust appropriately.

Also, the code below will not work properly if you have two rows that have the same exact customer ID and the same exact date_modified. If you run into that case you'll need to come up with the appropriate business logic to handle that.

INSERT INTO New_Table (
    fname,
    lname,
    address,
    city,
    state,
    zip,
    Customer_ID,
    date_modified )
SELECT
    OT1.fname,
    OT1.lname,
    OT1.address,
    OT1.city,
    OT1.state,
    OT1.zip,
    OT1.customer_id,
    OT1.date_modified
FROM
    (
    SELECT
        customer_id,
        MAX(date_modified) AS latest_date_modified
    FROM
        Old_Table
    GROUP BY customer_id) SQ
INNER JOIN Old_Table OT1 ON
    OT1.customer_id = OT1.customer_id AND
    OT1.date_modified = SQ.latest_date_modified

回复收藏 0 原文

别低头，皇冠会掉 2024-12-20 06:30:53

通过使用分析（又称窗口）函数为每个客户选择第一行，这非常容易。如果两行具有相同的 date_modified，则未定义采用哪一行。

INSERT INTO new_table (fname,lname,address,city,state,zip,Customer_ID,date_modified)
SELECT fname,
       lname,
       address,
       city,
       state,
       zip,
       Customer_ID,
       date_modified
FROM (
   SELECT fname,
          lname,
          address,
          city,
          state,
          zip,
          Customer_ID,
          date_modified,
          row_number() over (partition by customer_id order by date_modified desc) as rn
) 
WHERE rn = 1

This is quite easy by using analytical (aka windowing) functions to select the first row for each customer. In case two rows have the same date_modified, it is not defined which one is taken.

INSERT INTO new_table (fname,lname,address,city,state,zip,Customer_ID,date_modified)
SELECT fname,
       lname,
       address,
       city,
       state,
       zip,
       Customer_ID,
       date_modified
FROM (
   SELECT fname,
          lname,
          address,
          city,
          state,
          zip,
          Customer_ID,
          date_modified,
          row_number() over (partition by customer_id order by date_modified desc) as rn
) 
WHERE rn = 1

回复收藏 0 原文

~没有更多了~