具有 8000 万条记录的表并添加索引需要超过 18 小时(或永远)!现在怎么办?

发布于 2024-09-18 18:49:11 字数 723 浏览 13 评论 0原文

简短回顾一下所发生的事情。我正在处理 7100 万条记录(与其他人处理的数十亿条记录相比,这并不多)。在另一个线程上,有人建议我的集群当前的设置不适合我的需要。我的表结构是:

CREATE TABLE `IPAddresses` (
  `id` int(11) unsigned NOT NULL auto_increment,
  `ipaddress` bigint(20) unsigned default NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM;

然后我添加了7100万条记录,然后做了一个:

ALTER TABLE IPAddresses ADD INDEX(ipaddress);

已经14个小时了,操作仍然没有完成。经过谷歌搜索,我发现有一个众所周知的方法可以解决这个问题——分区。我知道我现在需要根据 IP 地址对表进行分区,但是我可以在不重新创建整个表的情况下执行此操作吗?我的意思是,通过 ALTER 语句?如果是,则有一个要求,即要分区的列应该是主键。我将使用这个 ipaddress 的 id 来构造一个不同的表,因此 ipaddress 不是我的主键。在这种情况下,如何对表进行分区?

A short recap of what happened. I am working with 71 million records (not much compared to billions of records processed by others). On a different thread, someone suggested that the current setup of my cluster is not suitable for my need. My table structure is:

CREATE TABLE `IPAddresses` (
  `id` int(11) unsigned NOT NULL auto_increment,
  `ipaddress` bigint(20) unsigned default NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM;

And I added the 71 million records and then did a:

ALTER TABLE IPAddresses ADD INDEX(ipaddress);

It's been 14 hours and the operation is still not completed. Upon Googling, I found that there is a well-known approach for solving this problem - Partitioning. I understand that I need to partition my table now based on the ipaddress but can I do this without recreating the entire table? I mean, through an ALTER statement? If yes, there was one requirement saying that the column to be partitioned on should be a primary key. I will be using the id of this ipaddress in constructing a different table so ipaddress is not my primary key. How do I partition my table given this scenario?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

初与友歌 2024-09-25 18:49:11

好吧,事实证明这个问题不仅仅是一个简单的创建表、索引它然后忘记问题:)这是我所做的,以防其他人面临同样的问题(我使用了 IP 地址的示例,但它适用于其他人)数据类型):

问题:您的表有数百万个条目,您需要非常快地添加索引

用例:考虑在查找表中存储数百万个 IP 地址。添加 IP 地址应该不是一个大问题,但为其创建索引需要超过 14 小时。

解决方案:使用 MySQL 的 Partitioning策略

情况#1:当您想要的表尚未创建时

CREATE TABLE IPADDRESSES(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  ipaddress BIGINT UNSIGNED,
  PRIMARY KEY(id, ipaddress)
) ENGINE=MYISAM
PARTITION BY HASH(ipaddress)
PARTITIONS 20;

情况#2:当您想要的表已经创建时
似乎有一种方法可以使用 ALTER TABLE 来执行此操作,但我尚未找到适当的解决方案。相反,有一个效率稍低的解决方案:

CREATE TABLE IPADDRESSES_TEMP(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  ipaddress BIGINT UNSIGNED,
  PRIMARY KEY(id)
) ENGINE=MYISAM;

将您的 IP 地址插入到此表中。然后创建带有分区的实际表:

CREATE TABLE IPADDRESSES(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  ipaddress BIGINT UNSIGNED,
  PRIMARY KEY(id, ipaddress)
) ENGINE=MYISAM
PARTITION BY HASH(ipaddress)
PARTITIONS 20;

最后就是

INSERT INTO IPADDRESSES(ipaddress) SELECT ipaddress FROM IPADDRESSES_TEMP;
DROP TABLE IPADDRESSES_TEMP;
ALTER TABLE IPADDRESSES ADD INDEX(ipaddress)

这样...在具有 1GB RAM 的 3.2GHz 机器上为新表建立索引花了我大约 2 个小时:) 希望这会有所帮助。

Ok turns out that this problem was more than just a simple create a table, index it and forget problem :) Here's what I did just in case someone else faces the same problem (I have used an example of IP Address but it works for other data types too):

Problem: Your table has millions of entries and you need to add an index really fast

Usecase: Consider storing millions of IP addresses in a lookup table. Adding the IP addresses should not be a big problem but creating an index on them takes more than 14 hours.

Solution: Partition your table using MySQL's Partitioning strategy

Case #1: When the table you want is not yet created

CREATE TABLE IPADDRESSES(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  ipaddress BIGINT UNSIGNED,
  PRIMARY KEY(id, ipaddress)
) ENGINE=MYISAM
PARTITION BY HASH(ipaddress)
PARTITIONS 20;

Case #2: When the table you want is already created.
There seems to be a way to use ALTER TABLE to do this but I have not yet figured out a proper solution for this. Instead, there is a slightly inefficient solution:

CREATE TABLE IPADDRESSES_TEMP(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  ipaddress BIGINT UNSIGNED,
  PRIMARY KEY(id)
) ENGINE=MYISAM;

Insert your IP addresses into this table. And then create the actual table with partitions:

CREATE TABLE IPADDRESSES(
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  ipaddress BIGINT UNSIGNED,
  PRIMARY KEY(id, ipaddress)
) ENGINE=MYISAM
PARTITION BY HASH(ipaddress)
PARTITIONS 20;

And then finally

INSERT INTO IPADDRESSES(ipaddress) SELECT ipaddress FROM IPADDRESSES_TEMP;
DROP TABLE IPADDRESSES_TEMP;
ALTER TABLE IPADDRESSES ADD INDEX(ipaddress)

And there you go... indexing on the new table took me about 2 hours on a 3.2GHz machine with 1GB RAM :) Hope this helps.

攒一口袋星星 2024-09-25 18:49:11

使用 MySQL 创建索引很慢,但也没有那么慢。对于 7100 万条记录,应该需要几分钟,而不是 14 小时。可能的问题是:

  • 您尚未配置排序缓冲区大小和其他配置选项,

请查看此处:http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_myisam_sort_buffer_size

如果您尝试使用 8MB 排序缓冲区生成 1GB 索引,则需要花费很多通行证。但如果缓冲区大于 CPU 缓存,速度就会变慢。所以你必须测试并看看什么最有效。

  • 有人在表上有锁,
  • 你的 IO 系统很糟糕,
  • 你的服务器正在交换
  • 等,

照常检查 iostat、vmstat、日志等。在你的表上发出一个 LOCK TABLE 以检查是否有人有锁。

仅供参考,在我的 64 位桌面上创建 10M 随机 BIGINT 索引需要 17 秒...

Creating indexes with MySQL is slow, but not that slow. With 71 million records, it should take a couple minutes, not 14 hours. Possible problems are :

  • you have not configured sort buffer sizes and other configuration options

look here : http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_myisam_sort_buffer_size

If you try to generate a 1GB index with a 8MB sort buffer it's going to take lots of passes. But if the buffer is larger than your CPU cache it will get slower. So you have to test and see what works best.

  • someone has a lock on the table
  • your IO system sucks
  • your server is swapping
  • etc

as usual check iostat, vmstat, logs, etc. Issue a LOCK TABLE on your table to check if someone has a lock on it.

FYI on my 64-bit desktop creating an index on 10M random BIGINTs takes 17s...

吾性傲以野 2024-09-25 18:49:11

我遇到了一个问题,我想通过添加索引来加快查询速度。该表只有大约 300,000 条记录,但花费的时间也太长。当我检查mysql服务器进程时,发现我试图优化的查询仍在后台运行。 4次!在我终止这些查询后,索引很快就完成了。也许同样的问题也适用于您的情况。

I had the problem where I wanted to speed up my query by adding an index. The table only had about 300.000 records but it also took way too long. When I checked the mysql server processes, it turned out that the query I was trying to optimize was still running in the background. 4 times! After I killed those queries, indexing was done in a jiffy. Perhaps the same problem applies to your situation.

对岸观火 2024-09-25 18:49:11

您正在使用即将被弃用的 MyISAM。另一种选择是 InnoDB。

“InnoDB 是 MySQL 的事务安全(符合 ACID)存储引擎,具有提交、回滚和崩溃恢复功能来保护用户数据。InnoDB 行级锁定(无需升级到较粗粒度的锁定)和 Oracle 风格的一致非锁定读取可提高多用户并发性和性能。InnoDB 将用户数据存储在聚集索引中,以减少基于主键的常见查询的 I/O。为了保持数据完整性,InnoDB 还支持 FOREIGN KEY 引用完整性约束,您可以自由地混合 InnoDB 表。与来自其他 MySQL 存储引擎的表,即使在同一语句中也是如此。"\

http://dev.mysql.com/doc/refman/5.0/en/innodb.html

根据:

http://dev.mysql.com/tech-resources/articles/storage-engine/part_1.html

,您应该能够在不同的引擎通过使用一个简单的更改命令来为您提供一定的灵活性。它还指出数据库中的每个表都可以独立配置。

You are using MyISAM which is being deprecated soon. An alternative would be InnoDB.

"InnoDB is a transaction-safe (ACID compliant) storage engine for MySQL that has commit, rollback, and crash-recovery capabilities to protect user data. InnoDB row-level locking (without escalation to coarser granularity locks) and Oracle-style consistent nonlocking reads increase multi-user concurrency and performance. InnoDB stores user data in clustered indexes to reduce I/O for common queries based on primary keys. To maintain data integrity, InnoDB also supports FOREIGN KEY referential-integrity constraints. You can freely mix InnoDB tables with tables from other MySQL storage engines, even within the same statement."\

http://dev.mysql.com/doc/refman/5.0/en/innodb.html

According to:

http://dev.mysql.com/tech-resources/articles/storage-engine/part_1.html

, you should be able to switch between different engine by utilizing a simple alter command which allows you some flexibility. It also states that each table in your DB can be configured independently.

仅此而已 2024-09-25 18:49:11

在你的桌子上。您已经插入了 710 亿条记录。现在,如果您想在表的主键列上创建分区,可以使用 alter table 选项。给出一个例子供大家参考。

CREATE TABLE t1 (
    id INT,
    year_col INT
);

ALTER TABLE t1
    PARTITION BY HASH(id)
    PARTITIONS 8;

In your table . you have already inserted 71 billion records. now if you want to create partitions on the primary key column of your table, you can use alter table option. An example is given for your reference.

CREATE TABLE t1 (
    id INT,
    year_col INT
);

ALTER TABLE t1
    PARTITION BY HASH(id)
    PARTITIONS 8;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文