定期将数据加载到表中的最佳方法是什么
我有一个包含静态表的数据库,需要每周从 CSV 进行更新。 表是 Mysql MyISAM,静态是指它们用于只读(显然,从 CVS 更新时除外)。
大约有 50 个表,每周需要重新加载总共大约 200MB 的数据。
我可以考虑 3 种方法:
- 截断表
- 从文件中加载数据
或者
- 为每个表创建一个临时表
- 在那里加载数据
- 截断(或删除行?)原始表
- 插入到原始表中 select * from 临时表。
或者
- 创建table_new并在那里加载数据
- 将原始表重命名为table_old(或完全删除表)
- 将table_new重命名为原始表
您认为最有效的方法是什么?
I have a database with static tables which require to be updated from CSV weekly.
Tables are Mysql MyISAM and by static i mean they are used for read only (except when updated from CVS, obviously).
There're about 50 tables and in total about 200mb of data to be reloaded weekly.
I can think about 3 ways:
- Truncate table
- Load data from files
Or
- For each table create a temporary table
- Load data there
- Truncate (or delete rows?) original table
- Insert into original table select * from temporary table.
Or
- Create table_new and load data there
- Rename original table to table_old (or drop table altogether)
- Rename table_new into original table
What do you reckon is the most efficient way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您考虑过使用 mysqlimport 吗?您可以在这里阅读:
http://dev.mysql.com/doc/refman/5.1/ en/mysqlimport.html
我可能不会对删除原始表做任何事情,因为那样你就必须重新创建所有外键、索引、约束等,这是一团糟,也是维护的噩梦。重命名表也会导致问题(比如如果你有表的同义词,但我不确定 mysql 是否有同义词)。
然而,我要做的是在加载数据之前禁用这些键。
换句话说,在加载数据时,您不希望它尝试更新索引,因为这会减慢加载速度。您希望加载完成后更新索引。
所以我认为通过将 mysqlimport 与上面的技巧结合起来,您应该能够获得真正有效的加载。
Have you considered using mysqlimport? You can read about it here:
http://dev.mysql.com/doc/refman/5.1/en/mysqlimport.html
I probably wouldn't do anything with deleting the original tables, because then you have to re-create all your foreign keys, indexes, constraints, etc. which is a mess and a maintenance nightmare. Renaming tables can also cause problems (like if you have synonyms for the tables, I'm not sure if mysql has synonyms though).
What I would do, however, is disable the keys before loading the data.
In other words, when loading the data you don't want it to be trying to update indexes because that will slow down the load. You want the indexes updated once the load is completed.
So I think by combining mysqlimport with the tip above, you should be able to get a really efficient load.
您始终可以执行
INSERT INTO ...关于重复密钥更新...
或替换为...
。您不应该有任何停机时间(在 TRUNCATE 和 INSERT 之间),并且损坏的可能性很小。请小心
REPLACE
,因为它实际上会删除每条记录并重新插入它,触发您可能拥有的任何触发器(在本例中不太可能),但如果您有自动记录,还会为您提供一个新 ID -增量字段。You could always do
INSERT INTO ... ON DUPLICATE KEY UPDATE ...
orREPLACE INTO ...
. You shouldn't get any down time (between a TRUNCATE and INSERT), and there's very little chance of corruption.Be careful with
REPLACE
, since it will actually delete each record and re-insert it, firing any triggers you may have (unlikely in this case), but also giving you a new ID if you have an auto-increment field.第三个选项是最好的,您可以在导入时锁定和禁用 _new 表上的键,并且速度会特别快。您甚至可以将所有新表“批量原子重命名”为“当前表”,如果它们之间存在关系,则停机时间为零。
我假设整个表都包含在每周的 cvs 更新中(即它们不是增量的)。
Your third option is the best, you can LOCK and DISABLE KEYS on the _new table while importing, and it'll be extra quick. You can even do a "batch atomic rename" of all your new tables to the "current ones", with zero downtime if they have relations between them.
I'm assuming the whole tables are contained in the weekly cvs updates (i.e. they're not incremental).
我更喜欢第三种方法并保留旧表。
这种方法的优点是快速、安全,对读者影响较小。新表的创建不会影响现有表的读取。重命名操作速度更快(对于 myisam,只需重命名文件),因此停机时间不会太长。所以客户不会受到太大影响。您还必须保留旧数据,以防新数据出现问题。
由于您不打算在线更新它,所以我认为如果您使用 myisampack 会很好。
I would prefer the 3rd method and also keep the old table.
The advantage of this method is that it fast and safe with less effect on the readers. The creation of new table does not affect reads on existing table. The rename operation is faster (just a file rename in case of myisam) so the downtime is not that much. So the clients will not be affected by this that much. You also got to keep the old data in case something is wrong with the new data.
As you are not going to update it online I think it will be good if you do myisampack.