更改 ID 号以消除表中的重复项

发布于 2024-11-19 03:30:22 字数 471 浏览 1 评论 0原文

我遇到了这个我正在尝试解决的问题:每天我都会将新记录导入到具有 ID 号的表中。

其中大多数是新的(以前从未在系统中见过),但有些又再次出现。我需要做的是,如果在档案中找到该号码,则在 ID 号码的末尾附加一个 alpha,但前提是该行中的数据与档案中的数据不同,并且这需要按顺序完成,IE,如果第二次看到 12345 的数据不同,我将其更改为 12345A,如果再次看到 12345,并且再次不同,我需要将其更改为 12345B, 删除

最初我尝试使用 where 循环,它将所有“再次看到”记录放入临时表中,然后第一次分配 A,然后删除它们,将 B 分配给剩下的记录, 等等,直到临时表为空,但这还没有解决。

或者,我一直在考虑尝试子查询,如:

update table
set IDNO= (select max idno from archive) plus 1

有什么建议吗?

I have run into this problem that I'm trying to solve: Every day I import new records into a table that have an ID number.

Most of them are new (have never been seen in the system before) but some are coming in again. What I need to do is to append an alpha to the end of the ID number if the number is found in the archive, but only if the data in the row is different from the data in the archive, and this needs to be done sequentially, IE, if 12345 is seen a 2nd time with different data, I change it to 12345A, and if 12345 is seen again, and is again different, I need to change it to 12345B, etc.

Originally I tried using a where loop where it would put all the 'seen again' records in a temp table, and then assign A first time, then delete those, assign B to what's left, delete those, etc., till the temp table was empty, but that hasn't worked out.

Alternately, I've been thinking of trying subqueries as in:

update table
set IDNO= (select max idno from archive) plus 1

Any suggestions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

心房的律动 2024-11-26 03:30:22

作为一个想法怎么样?请注意,这基本上是伪代码,因此请根据需要进行调整。

将“src”作为所有数据最终将插入到的表,将“TMP”作为临时表......并且假设 TMP 中的 ID 列是双精度型。

do
    update tmp set id = id + 0.01 where id in (select id from src);
until no_rows_changed;

alter table TMP change id into id varchar(255);

update TMP set id = concat(int(id), chr((id - int(id)) * 100 + 64);

insert into SRC select * from tmp;

How about this as an idea? Mind you, this is basically pseudocode so adjust as you see fit.

With "src" as the table that all the data will ultimately be inserted into, and "TMP" as your temporary table.. and this is presuming that the ID column in TMP is a double.

do
    update tmp set id = id + 0.01 where id in (select id from src);
until no_rows_changed;

alter table TMP change id into id varchar(255);

update TMP set id = concat(int(id), chr((id - int(id)) * 100 + 64);

insert into SRC select * from tmp;
吲‖鸣 2024-11-26 03:30:22

当你到达 12345Z 时会发生什么?

不管怎样,稍微改变一下表结构,秘诀如下:

  1. 删除 ID 上的任何索引。

  2. ID(显然是varchar)拆分为ID_Num(long int)和ID_Alpha(varchar,非空)。将 ID_Alpha 的默认值设置为空字符串 ('')。
    因此,12345B (varchar) 变为 12345 (long int) 和 'B' (varchar) 等。

  3. 创建一个唯一的、理想集群,在列 ID_NumID_Alpha 上建立索引。
    将此作为主键。或者,如果必须,请使用自动递增整数作为伪主键。

  4. 现在,当添加新数据时,查找重复的 ID 号是微不足道的,并且可以通过简单的 max() 操作获得最后一个 ID_Alpha。

  5. 现在,使用 while 循环或游标(如果必须),解决重复 ID 应该是一项更容易的任务。
    但是,也应该可以避免“Row by agonizing row”(RBAR),并使用基于集合的方法。阅读几天 Jeff Moden 文章,应该会给您这方面的想法.

What happens when you get to 12345Z?

Anyway, change the table structure slightly, here's the recipe:

  1. Drop any indices on ID.

  2. Split ID (apparently varchar) into ID_Num (long int) and ID_Alpha (varchar, not null). Make the default value for ID_Alpha an empty string ('').
    So, 12345B (varchar) becomes 12345 (long int) and 'B' (varchar), etc.

  3. Create a unique, ideally clustered, index on columns ID_Num and ID_Alpha.
    Make this the primary key. Or, if you must, use an auto-incrementing integer as a pseudo primary key.

  4. Now, when adding new data, finding duplicate ID number's is trivial and the last ID_Alpha can be obtained with a simple max() operation.

  5. Resolving duplicate ID's should now be an easier task, using either a while loop or a cursor (if you must).
    But, it should also be possible to avoid the "Row by agonizing row" (RBAR), and use a set-based approach. A few days of reading Jeff Moden articles, should give you ideas in that regard.

夜未央樱花落 2024-11-26 03:30:22

这是我的最终解决方案:

update a
set IDnum=b.IDnum
from tempimiportable A inner join 
    (select * from archivetable
     where IDnum in 
     (select max(IDnum) from archivetable
      where IDnum in 
       (select IDnum from tempimporttable)
      group by left(IDnum,7) 
      )
     ) b
on b.IDnum like a.IDnum + '%'
WHERE 
*row from tempimport table = row from archive table*

将传入行设置为与旧行相同的 IDnum,然后

update a
set patient_account_number = case 
    when len((select max(IDnum) from archive where left(IDnum,7) = left(a.IDnum,7)))= 7 then a.IDnum + 'A'
    else left(a.IDnum,7) + char(ascii(right((select max(IDnum) from archive where left(IDnum,7) = left(a.IDnum,7)),1))+1)
    end
from tempimporttable a
where not exists ( *select rows from archive table* )

我不知道是否有人想深入研究这个问题,但我很欣赏建设性的批评......

Here is my final solution:

update a
set IDnum=b.IDnum
from tempimiportable A inner join 
    (select * from archivetable
     where IDnum in 
     (select max(IDnum) from archivetable
      where IDnum in 
       (select IDnum from tempimporttable)
      group by left(IDnum,7) 
      )
     ) b
on b.IDnum like a.IDnum + '%'
WHERE 
*row from tempimport table = row from archive table*

to set incoming rows to the same IDnum as old rows, and then

update a
set patient_account_number = case 
    when len((select max(IDnum) from archive where left(IDnum,7) = left(a.IDnum,7)))= 7 then a.IDnum + 'A'
    else left(a.IDnum,7) + char(ascii(right((select max(IDnum) from archive where left(IDnum,7) = left(a.IDnum,7)),1))+1)
    end
from tempimporttable a
where not exists ( *select rows from archive table* )

I don't know if anyone wants to delve too far into this, but I appreciate contructive criticism...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文