更改 ID 号以消除表中的重复项
我遇到了这个我正在尝试解决的问题:每天我都会将新记录导入到具有 ID 号的表中。
其中大多数是新的(以前从未在系统中见过),但有些又再次出现。我需要做的是,如果在档案中找到该号码,则在 ID 号码的末尾附加一个 alpha,但前提是该行中的数据与档案中的数据不同,并且这需要按顺序完成,IE,如果第二次看到 12345 的数据不同,我将其更改为 12345A,如果再次看到 12345,并且再次不同,我需要将其更改为 12345B, 删除
最初我尝试使用 where
循环,它将所有“再次看到”记录放入临时表中,然后第一次分配 A,然后删除它们,将 B 分配给剩下的记录, 等等,直到临时表为空,但这还没有解决。
或者,我一直在考虑尝试子查询,如:
update table
set IDNO= (select max idno from archive) plus 1
有什么建议吗?
I have run into this problem that I'm trying to solve: Every day I import new records into a table that have an ID number.
Most of them are new (have never been seen in the system before) but some are coming in again. What I need to do is to append an alpha to the end of the ID number if the number is found in the archive, but only if the data in the row is different from the data in the archive, and this needs to be done sequentially, IE, if 12345 is seen a 2nd time with different data, I change it to 12345A, and if 12345 is seen again, and is again different, I need to change it to 12345B, etc.
Originally I tried using a where
loop where it would put all the 'seen again' records in a temp table, and then assign A first time, then delete those, assign B to what's left, delete those, etc., till the temp table was empty, but that hasn't worked out.
Alternately, I've been thinking of trying subqueries as in:
update table
set IDNO= (select max idno from archive) plus 1
Any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
作为一个想法怎么样?请注意,这基本上是伪代码,因此请根据需要进行调整。
将“src”作为所有数据最终将插入到的表,将“TMP”作为临时表......并且假设 TMP 中的 ID 列是双精度型。
How about this as an idea? Mind you, this is basically pseudocode so adjust as you see fit.
With "src" as the table that all the data will ultimately be inserted into, and "TMP" as your temporary table.. and this is presuming that the ID column in TMP is a double.
当你到达 12345Z 时会发生什么?
不管怎样,稍微改变一下表结构,秘诀如下:
删除
ID
上的任何索引。将
ID
(显然是varchar)拆分为ID_Num
(long int)和ID_Alpha
(varchar,非空)。将ID_Alpha
的默认值设置为空字符串 (''
)。因此,12345B (varchar) 变为 12345 (long int) 和 'B' (varchar) 等。
创建一个唯一的、理想集群,在列
ID_Num
和ID_Alpha
上建立索引。将此作为主键。或者,如果必须,请使用自动递增整数作为伪主键。
现在,当添加新数据时,查找重复的 ID 号是微不足道的,并且可以通过简单的
max()
操作获得最后一个 ID_Alpha。现在,使用 while 循环或游标(如果必须),解决重复 ID 应该是一项更容易的任务。
但是,也应该可以避免“Row by agonizing row”(RBAR),并使用基于集合的方法。阅读几天 Jeff Moden 文章,应该会给您这方面的想法.
What happens when you get to 12345Z?
Anyway, change the table structure slightly, here's the recipe:
Drop any indices on
ID
.Split
ID
(apparently varchar) intoID_Num
(long int) andID_Alpha
(varchar, not null). Make the default value forID_Alpha
an empty string (''
).So, 12345B (varchar) becomes 12345 (long int) and 'B' (varchar), etc.
Create a unique, ideally clustered, index on columns
ID_Num
andID_Alpha
.Make this the primary key. Or, if you must, use an auto-incrementing integer as a pseudo primary key.
Now, when adding new data, finding duplicate ID number's is trivial and the last ID_Alpha can be obtained with a simple
max()
operation.Resolving duplicate ID's should now be an easier task, using either a while loop or a cursor (if you must).
But, it should also be possible to avoid the "Row by agonizing row" (RBAR), and use a set-based approach. A few days of reading Jeff Moden articles, should give you ideas in that regard.
这是我的最终解决方案:
将传入行设置为与旧行相同的 IDnum,然后
我不知道是否有人想深入研究这个问题,但我很欣赏建设性的批评......
Here is my final solution:
to set incoming rows to the same IDnum as old rows, and then
I don't know if anyone wants to delve too far into this, but I appreciate contructive criticism...