我怎样才能“减去” 一张桌子与另一张桌子?
我有一个主表 A
,大约有 900 万行。 另一个表 B
(相同结构)包含表 A
中的约 28K 行。 从表 A
中删除 B
的所有内容的最佳方法是什么?
所有列(~10)的组合都是唯一的。 没有什么比唯一密钥的形式更重要了。
I have a master table A
, with ~9 million rows. Another table B
(same structure) has ~28K rows from table A
. What would be the best way to remove all contents of B
from table A
?
The combination of all columns (~10) are unique. Nothing more in the form a of a unique key.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
如果您有足够的权限,您可以创建一个新表并将其重命名为 A。要创建新表,您可以使用以下脚本:
这应该表现得很好。
If you have sufficient rights you can create a new table and rename that one to A. To create the new table you can use the following script:
This should perform pretty good.
应该管用。 不过可能需要一段时间。
Should work. Might take a while though.
一种方法,只需列出所有列
one way, just list out all the columns
这可能会非常慢,因为您必须对每个列建立索引,这在这种大小的表没有主键的环境中极不可能,因此在非高峰期间进行。 是什么让您拥有一个包含 900 万条记录且没有主键的表?
This is likely to be very slow as you would have to have every col indexed which is highly unlikely in an environment when a table this size has no primary key, so do it during off peak. What possessed you to have a table with 9 million records and no primary key?
如果这是您必须定期执行的操作,那么第一个选择应该是尝试改进数据库设计(寻找主键,尝试使“连接”条件出现在尽可能少的列上) 。
如果这是不可能的,则不同的第二个选项是计算出每列的“选择性”(即每列有多少个“不同”值,“名称”比“地址国家/地区”比“地址国家/地区”更具选择性)男/女')。
我建议的一般类型的语句是这样的:
想法是按照选择性的顺序列出列,并在 tableB 上的 colx1、colx2 等上构建索引。 表 B 中的确切列数将是一些试验和测量的结果。 (用删除语句的改进时间来抵消在 tableB 上构建索引的时间。)
如果这只是一次性操作,我只会选择上面概述的慢速方法之一。 当你可以在回家之前开始发表声明时,可能不值得花太多时间考虑这个问题......
If this is something you'll have to do on a regular basis, the first choice should be to try to improve the database design (looking for primary keys, trying to get the "join" condition to be on as few columns as possible).
If that is not possible, the distinct second option is to figure out the "selectivity" of each of the columns (i.e. how many "different" values does each column have, 'name' would be more selective than 'address country' than 'male/female').
The general type of statement I'd suggest would be like this:
The idea is to list the columns in order of the selectivity and build an index on colx1, colx2 etc. on tableB. The exact number of columns in tableB would be a result of some trial&measure. (Offset the time for building the index on tableB with the improved time of the delete statement.)
If this is just a one time operation, I'd just pick one of the slow methods outlined above. It's probably not worth the effort to think too much about this when you can just start a statement before going home ...
是否有可以使用的键值(或多个值)?
就像是
Is there a key value (or values) that can be used?
something like