有没有办法限制 Mysql 表的索引,使整体性能不受影响?

发布于 2024-11-06 13:19:03 字数 432 浏览 0 评论 0原文

我需要将大型数据集加载到生产数据库中。

需要分别上传 15 个文件并插入到表中。 每个大约500 Mb

我有两个需要索引的 ID 列。 如果我加载带有适当索引的文件,则上传大约需要 3 小时。 如果我删除索引,加载数据本地infile,然后重新添加索引,整个操作大约需要30分钟。

问题是,在对新导入的数据建立索引时,数据库响应能力会受到很大影响。有没有办法让索引以“低优先级”运行,以便其他查询仍然获得 95-100% 的速度,并且索引在后台运行?

我使用的是 Amazon RDS,因此我无法选择仅加载到不同的服务器上然后复制表文件。

对此添加赏金,因为我仍然想看看是否有一种方法可以在对特定框进行索引时获得良好的性能。

I need to load a large data set onto a production database.

15 files need to each be uploaded and inserted into a table.
Each is about 500 Mb.

I have two ID columns that need to be indexed.
If I load the files with indexes in place, the upload takes around 3 hours.
If I drop indexes, load data local infile, then re-add the indexes, the whole operation takes about 30 minutes.

The problem is, database responsiveness takes a big hit while indexing the freshly imported data. Is there a way to make the indexing run at a "low priority" so that other queries still get 95-100% speed and the indexing kind of chugs along in the background?

I'm using Amazon RDS, so I don't have the option of just loading on a different server then copying over the table files.

Adding a bounty to this as I still want to see if there is a way to get good performance while indexing on a specific box.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

爱本泡沫多脆弱 2024-11-13 13:19:03

好吧,我从来没有找到一种方法来节流,但我确实找到了一种方法来缓解我的问题。
该解决方案对于我的问题来说是独一无二的,但我会发布它以防其他人发现它有用。

我编写了一个名为 CautiousIndexer 的类。

  1. 首先,我存储了创建表语句来重新创建没有索引的表结构。我存储了一个读取从属数据库的数组,循环遍历它们,将包含未索引数据的表重命名为 prevent_indexing_($name)
  2. 然后我只在从属设备上运行创建表语句。这有效地将数据移出了主服务器上发生的索引语句。
  3. 然后我对主服务器运行索引查询。当主服务器建立索引时,读从服务器对性能没有影响,因为新创建的表是空的。
  4. 当主服务器完成索引后,我将其中 1 个从服务器退出生产轮换,删除空表,将完整的表移回原位,然后在停止生产的从服务器上对该表建立索引。
  5. 完成后,我将其重新投入生产,并对其余从站重复从站索引过程。
  6. 当所有奴隶都被索引后,我将表投入生产。

就效率而言,这仍然很好,但在主服务器上建立索引期间,写入性能下降得令人无法接受。仍在寻找一种通过节流进行索引的方法。

Well, I never found a way to throttle, but I did figure out a way to alleviate my problem.
The solution was unique to my problem, but I'll post it in case someone else finds it useful.

I wrote a class named CautiousIndexer.

  1. First I stored the create table statement to recreate the table structure without indexes. I stored an array of read slave databases, looped through them renaming the table with the unindexed data to prevent_indexing_($name).
  2. Then I ran the create table statement on the slaves only. This effectively moved the data out of the way of indexing statements that would happen on the master.
  3. Then I ran the index query against the master. Read slaves had no performance impact while the master was indexing because the newly created tables were empty.
  4. When the master finished indexing, I took 1 of the slaves out of production rotation, dropped the empty table, moved the full table back in place, then indexed the table on the out of production slave.
  5. When that finished I put it back in production and repeated the slave indexing procedure on the remaining slaves.
  6. When all slaves were indexed, I put the table into production.

This was still fine in terms of efficiency, but during the indexing on the master server write performance was unacceptably slowed. Still looking for a way to index with throttling.

兔姬 2024-11-13 13:19:03

一个好的解决方案是使用执行滚动更新的脚本。您可以以非复制方式将索引应用于每个从站。粗略说明:

for host in $hosts
do
    mysql -h $host -e "STOP SLAVE;\
      SET sql_log_bin=0;\
      FLUSH TABLE t;\
      ALTER TABLE t ADD INDEX a (b,c);\
      SET sql_log_bin=1;\
      START SLAVE;"
done

通过关闭复制,应该减少磁盘活动量并提高索引操作的速度。如果您对从属设备有数据库延迟要求,您可能希望完全取消从属设备池化,并包含在从属设备恢复零秒延迟时重新池化从属设备的逻辑。

A good solution to this is a script that performs a rolling update. You would apply the index to each slave in a non-replicating manner. A rough illustration:

for host in $hosts
do
    mysql -h $host -e "STOP SLAVE;\
      SET sql_log_bin=0;\
      FLUSH TABLE t;\
      ALTER TABLE t ADD INDEX a (b,c);\
      SET sql_log_bin=1;\
      START SLAVE;"
done

By turning off replication, the amount of disk activity should be reduced and increase the speed of the indexing operation. If you have database lag requirements for your slaves, you might want to entirely de-pool the slave and include logic to re-pool the slave when it resumes zero seconds lag.

空‖城人不在 2024-11-13 13:19:03

这不是您正在寻找的确切解决方案,但您可以 bing up 第二个 mysqld 实例作为此单个框上的从属,并根据需要将 SELECT 查询重定向到它。有 MySQL Proxy 可以帮助您完成此任务无需重写客户端应用程序。

您还可以从 MySQL 的 FriendFeed 使用中收集一些想法。它们将实际索引存储在其他表中并使用它们进行搜索。如果您将数据的副本存储在其他表中,甚至在其他服务器上,并在那里运行索引,您将能够尽快全速访问主数据,并在以后使用其他服务器获得更快的查询。

就像您在从属服务器上添加索引以进行搜索类型查询并在主服务器上仅运行主键查找一样。

This is not an exact solution you're looking for, but you can bing up a second mysqld instance as a slave on this single box and redirect SELECT queries to it as needed. There is MySQL Proxy which can help you to accomplish this without rewriting client apps.

You can also gather some ideas from FriendFeed usage of MySQL. They store actual indexes in other tables and use them for search. If you store a copy of you data in other table even on other server and run indexes there, you'll be able to access master data ASAP at full speed and get speedier queries later using other server.

It's like if you add indexes on a slave for search-type queries and run only primary key lookups on the master.

伴梦长久 2024-11-13 13:19:03

以前没有尝试过的想法也不是关于索引限制,如果您创建一个备份表并用您提到的方式更新它,那么时间跨度更短,并且比转换/重命名表要好。我鼓励写下我的想法,因为你需要知道一种方法。

An idea not tried before also not about index throttling , what if you make a backup table and update it with the way you mentioned has shorter time span and than convert/rename the tables. I encourage to write my thoughts b/c you need to know a way.

旧人哭 2024-11-13 13:19:03

您可以在插入时禁用任何非唯一索引,并在完成后重新启用它们。看一下禁用键/启用键。但它仅适用于非唯一索引。

如果使用多值插入语句,也可以加快插入速度(插入表(...)值(...),(...),(...)...

顺便说一下,加载data infile 似乎是在 mysql 中插入大量数据的最快方法。

You can disable any non-unique indexes while inserting, and re-enable them after you finish. Take a look at disable keys / enable keys. But it works only for non-unique indexes.

You can speed up inserts as well if you use multi-values insert statements (insert into table(...) values(...),(...),(...)...

By the way, load data infile seems to be the fastest way to insert a lot of data in mysql.

草莓酥 2024-11-13 13:19:03

您是否尝试过提高导入的索引设置?这可以显着提高进口绩效。 sort_buffer_size适用于任何表类型,myisam_sort_buffer_size适用于MyISAM表。 innodb_buffer_pool_size 是 Innodb 的“密钥缓存”。根据您的表类型,将其添加到导入中。您想要做的是避免在索引创建期间进行文件排序。

您也许可以将导入/索引时间缩短至 10-15 分钟或更短。它不是节流,但会显着缩短影响期。

或者,如果您使用 MyISAM 表,也许 MERGE 表是一个选项?创建一个新表,执行导入,然后将新表添加到 MERGE 表中。导入过程中不会对数据库产生任何影响。除了执行任务的服务器之外。

Have you tried bumping up your index settings for the import? That can increase import performance significantly. sort_buffer_size is for any table type, myisam_sort_buffer_size is for MyISAM tables. innodb_buffer_pool_size is sort of your "key cache" for Innodb. Bump those up for the import depending on your table type. What you are trying to do is avoid file sorting during index creation.

You may be able to get your import/index time down to 10-15 minutes or less. It's not throttling, but it will significantly shorten the impact period.

Or, if you are using MyISAM tables, maybe a MERGE table is an option? Create a new table, perform the import, than add the new table to the MERGE table. There will be no impact on the database during import. Aside from the server performing a task.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文