将数据从一个 MySQL 数据库导入到另一个数据库的最有效方法

发布于 2024-08-11 16:18:03 字数 1033 浏览 4 评论 0原文

我正在编写一个 PHP 脚本,它将两个不同数据库中的表中的数据导入到另一个数据库中。我已经让它可以正常处理示例数据,但现在我已经转向使用更接近其最终用途的数据:每个表超过 25 百万条记录,并且每天都在增长。显然,效率是一个令人担忧的问题。

以下是它目前的工作原理。我复制表结构,添加几个额外字段以保持密钥完整性:

other1.someTable (field1, field2, field3) Pk = [field1, field2]
other2.someTable (field1, field2, field3) Pk = [field1, field2]
mydb.someTable   (id, source, field1, field2, field3)
    Pk = id, Unique key = [source, field1, field2]

这是 SQL。它有一个 ON DUPLICATE KEY UPDATE 语句,因为需要定期执行此导入,更新“mydb”中的数据。值得庆幸的是,记录不会从“其他”数据库中删除(我认为!)。

INSERT INTO mydb.someTable (source, field1, field2, field3)
SELECT 1, field1, field2, field3 FROM other1.someTable
ON DUPLICATE KEY UPDATE field1 = field1, field2 = field2, field3 = field3;

INSERT INTO mydb.someTable (source, field1, field2, field3)
SELECT 2, field1, field2, field3 FROM other2.someTable;
ON DUPLICATE KEY UPDATE field1 = field1, field2 = field2, field3 = field3;

我的问题是:这是执行此操作的最佳方法吗?考虑到将有数百万条记录,每个记录总共有许多 GB 的数据,是否还有其他可能更快的方法?桌子?

I'm writing a PHP script which imports data from tables in two different databases into another one. I've got it working ok with sample data, except now I've moved to using data closer resembling its final use: 25+ million records per table, and growing daily. Obviously, efficiency is a bit of a concern.

Here's how it current works. I copy the table structure, adding a couple of extra fields to maintain key integrity:

other1.someTable (field1, field2, field3) Pk = [field1, field2]
other2.someTable (field1, field2, field3) Pk = [field1, field2]
mydb.someTable   (id, source, field1, field2, field3)
    Pk = id, Unique key = [source, field1, field2]

And here's the SQL. It has a ON DUPLICATE KEY UPDATE statement because this import needs to be done regularly, updating the data in "mydb". Thankfully, records won't be deleted from the "other" database (i think!).

INSERT INTO mydb.someTable (source, field1, field2, field3)
SELECT 1, field1, field2, field3 FROM other1.someTable
ON DUPLICATE KEY UPDATE field1 = field1, field2 = field2, field3 = field3;

INSERT INTO mydb.someTable (source, field1, field2, field3)
SELECT 2, field1, field2, field3 FROM other2.someTable;
ON DUPLICATE KEY UPDATE field1 = field1, field2 = field2, field3 = field3;

My question is this: Is this the best possible way to do this? Are there any other methods which might be faster, considering there are going to be millions and millions of records, totaling many gigabytes of data per table?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

旧伤还要旧人安 2024-08-18 16:18:03

你确定没有重复的ID吗?或者,如果有,您是否总是会用第二个数据库中的数据覆盖它们?

此外,在将数据插入/更新到第三个数据库之前,您是否对从 DB1 / DB2 获取的数据进行任何处理?

如果第一个问题的答案是“是”,第三个问题的答案是“否”,那么使用 加载数据文件。从DB1和DB2中选择数据并按顺序加载。

Are you sure there are no duplicate IDs? Or, if there are, are you always going to overwrite them with data from the second database?

Additionally, do you do any processing on the data you obtain from DB1 / DB2 prior to inserting / updating it into 3rd database?

If the answers are "yes" to the first question and "no" to the third, it will likely be a lot faster to use LOAD DATA INFILE. Select data from DB1 and DB2 and load them in sequence.

允世 2024-08-18 16:18:03

在您的“重复密钥更新”中,无需更新 field1 和 field2,因为它们是密钥并且已匹配。

另一个问题是:您是否关心 1 将 field3 设置为一个值,然后 2 将其设置为另一个值 - 明天和后天再次 - 这是需要知道的事情吗?

Well on your On Duplicate Key Update, there is not need to update field1 and field2 as they are the key and have been matched.

The other question is: do you care if 1 sets field3 to one value and then 2 sets it to another -- and again tomorrow, and the day after -- is this something to know happened?

葮薆情 2024-08-18 16:18:03

您是否考虑过使用联合表< /a>?

Have you considered using federated tables?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文