在 MySQL 中维护一个包含唯一值的大型表

发布于 2024-10-01 22:18:25 字数 727 浏览 0 评论 0原文

这可能是一种常见的情况,但我在 SO 或 Google 上找不到具体的答案。

我在 MySQL 数据库上有一个很大的朋友关系表(>1000 万行),该表非常重要,需要维护以确保没有重复的行。该表存储用户的uid。该表的SQL是:

CREATE TABLE possiblefriends(
 id INT NOT NULL AUTO_INCREMENT, 
 PRIMARY KEY(id),
 user INT, 
 possiblefriend INT)

该表的工作方式是每个用户大约有1000个左右的“可能的朋友”被发现并需要存储,但需要避免重复的“可能的朋友”。

问题是,由于程序的设计,在一天的时间内,我需要向表中添加 100 万行或更多行,这些行可能是重复的行条目,也可能不是重复的行条目。简单的答案似乎是检查每一行以查看它是否重复,如果不是,则将其插入表中。但当表大小增加到 1 亿行、10 亿行或更高时(我预计很快就会实现),这种技术可能会变得非常慢。

维护这个唯一表的最佳(即最快)方法是什么?

我不需要始终拥有一个仅包含唯一值的表。我每天只需要它一次来进行批处理作业。在这种情况下,我是否应该创建一个单独的表,仅插入所有可能的行(包含重复行和所有行),然后在一天结束时创建第二个表来计算第一个表中的所有唯一行?

如果不是,从长远来看,该表的最佳方法是什么?

(如果索引是最佳的长期解决方案,请告诉我使用哪些索引)

This is probably a common situation, but I couldn't find a specific answer on SO or Google.

I have a large table (>10 million rows) of friend relationships on a MySQL database that is very important and needs to be maintained such that there are no duplicate rows. The table stores the user's uids. The SQL for the table is:

CREATE TABLE possiblefriends(
 id INT NOT NULL AUTO_INCREMENT, 
 PRIMARY KEY(id),
 user INT, 
 possiblefriend INT)

The way the table works is that each user has around 1000 or so "possible friends" that are discovered and need to be stored, but duplicate "possible friends" need to be avoided.

The problem is, due to the design of the program, over the course of a day, I need to add 1 million rows or more to the table that may or not be duplicate row entries. The simple answer would seem to be to check each row to see if it is a duplicate, and if not, then insert it into the table. But this technique will probably get very slow as the table size increases to 100 million rows, 1 billion rows or higher (which I expect it to soon).

What is the best (i.e. fastest) way to maintain this unique table?

I don't need to have a table with only unique values always on hand. I just need it once-a-day for batch jobs. In this case, should I create a separate table that just inserts all the possible rows (containing duplicate rows and all), and then at the end of the day, create a second table that calculates all the unique rows in the first table?

If not, what is the best way for this table long-term?

(If indexes are the best long-term solution, please tell me which indexes to use)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

紙鸢 2024-10-08 22:18:25

(user, possiblefriend) 上添加唯一索引,然后使用以下之一:

以确保在尝试插入重复行时不会出现错误。

您可能还需要考虑是否可以删除自动递增主键并使用 (user, possiblefriend) 作为主键。这将减少表的大小,并且主键将充当索引,从而使您不必创建额外的索引。

另请参阅:

Add a unique index on (user, possiblefriend) then use one of:

to ensure that you don't get errors when you try to insert a duplicate row.

You might also want to consider if you can drop your auto-incrementing primary key and use (user, possiblefriend) as the primary key. This will decrease the size of your table and also the primary key will function as the index, saving you from having to create an extra index.

See also:

安静 2024-10-08 22:18:25

唯一索引可以让您确定该字段确实是唯一的,您可以像这样添加唯一索引:

CREATE TABLE possiblefriends( 
 id INT NOT NULL AUTO_INCREMENT,  
 PRIMARY KEY(id), 
 user INT,  
 possiblefriend INT,
PRIMARY KEY (id),
UNIQUE INDEX DefUserID_UNIQUE (user ASC, possiblefriend ASC))

这也将显着提高您的表访问速度。

批量插入的另一个问题有点棘手,您可以使用下面内置的 ON DUPLICATE KEY UPDATE 功能:

INSERT INTO table (a,b,c) VALUES (1,2,3)
  ON DUPLICATE KEY UPDATE c=c+1;

UPDATE table SET c=c+1 WHERE a=1;

A unique index will let you be sure that the field is indeed unique, you can add a unique index like so:

CREATE TABLE possiblefriends( 
 id INT NOT NULL AUTO_INCREMENT,  
 PRIMARY KEY(id), 
 user INT,  
 possiblefriend INT,
PRIMARY KEY (id),
UNIQUE INDEX DefUserID_UNIQUE (user ASC, possiblefriend ASC))

This will also speec up your table access significantly.

Your other issue with the mass insert is a little more tricky, you could use the in-built ON DUPLICATE KEY UPDATE function below:

INSERT INTO table (a,b,c) VALUES (1,2,3)
  ON DUPLICATE KEY UPDATE c=c+1;

UPDATE table SET c=c+1 WHERE a=1;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文