如何翻译和迁移数据

发布于 2024-08-05 13:02:50 字数 1490 浏览 2 评论 0 原文

我正在构建 php Web 应用程序，让用户上传 MS Access 数据库（csv 导出），然后将其翻译并迁移到 MySQL 数据库中。

MS Access 数据库由一个名为 t_product 的表组成，该表有 100k 行。这张桌子设计得不好。例如，以下查询：

SELECT part_number, model_number FROM t_product

将返回：

part_number model_number
100  AX1000, AX1001, AX1002
101  CZ10, CZ220, MB100

如您所见，型号以逗号分隔值形式列出，而不是另一个表中的单独记录。类似性质的问题还有很多。我正在编写一个脚本来在导入 mysql 数据库之前清理这些数据。该脚本还将现有的 Access 列映射到适当的关系设计数据库。

我的问题是我的脚本需要很长时间才能完成。下面是解释我正在做的事情的简化代码：

$handle = fopen("MSAccess.csv, "r");

// get each row from the csv
while ($data=fgetcsv($handle, 1000, ","))
{
 mysql_query("INSERT INTO t_product (col1, col2 etc...) values ($data[0], $data[1], etc...");
 $prodId = mysql_last_insert_id();

 // using model as an example, there are other columns
 // with csv values that need to be broken up
 $arrModel = explode(',', $data[2]);
 foreach($arrModel as $modelNumber)
 mysql_query("INSERT INTO t_model (product_id, col1, col2 etc...) values ($prodId, $modelNumber[0], $modelNumber[1] etc...");
}

这里的问题是每个 while 循环迭代都会对数据库进行大量调用。对于每条产品记录，我必须插入 N 个型号、Y 个零件号、X 个序列号等...

我开始了另一种方法，将整个 CSV 存储在数组中。然后，我编写一个批处理查询，例如“

$sql = "INSERT INTO t_product (col1, col2, etc...) values ";
foreach($arrParam as $val)
 $sql .= " ($val[0], $val[1], $val[2]), "

但我用这种方法遇到了过多的内存错误”。我将最大内存限制增加到 64M，但内存仍然不足。

解决这个问题的最佳方法是什么？

也许我应该先将所有查询写入 *.sql 文件，然后将 *.sql 文件导入 mysql 数据库？

原文

I am building php web application that let's a user upload a MS Access Database (csv export) that is then translated and migrated into a MySQL database.

The MS Access database consists of one table called t_product of 100k rows. This table is not designed well. As an example, the following query:

SELECT part_number, model_number FROM t_product

will return:

part_number model_number
100  AX1000, AX1001, AX1002
101  CZ10, CZ220, MB100

As you can see, the model numbers are listed as comma separated values instead of individual records in another table. There are many more issues of this nature. I'm writing a script to clean this data before importing into the mysql database. The script will also map existing Access columns to a proper relationally design database.

My issue is that my script takes too long to complete. Here's simplified code to explain what I'm doing:

$handle = fopen("MSAccess.csv, "r");

// get each row from the csv
while ($data=fgetcsv($handle, 1000, ","))
{
 mysql_query("INSERT INTO t_product (col1, col2 etc...) values ($data[0], $data[1], etc...");
 $prodId = mysql_last_insert_id();

 // using model as an example, there are other columns
 // with csv values that need to be broken up
 $arrModel = explode(',', $data[2]);
 foreach($arrModel as $modelNumber)
 mysql_query("INSERT INTO t_model (product_id, col1, col2 etc...) values ($prodId, $modelNumber[0], $modelNumber[1] etc...");
}

The problem here is that each while-loop iteration makes a tremendous number of calls to the database. For every product record, I have to insert N model numbers, Y part numbers, X serial numbers etc...

I started another approach where I stored the whole CSV in an array. I then write one batch query like

$sql = "INSERT INTO t_product (col1, col2, etc...) values ";
foreach($arrParam as $val)
 $sql .= " ($val[0], $val[1], $val[2]), "

But I ran into excessive memory errors with this approach. I increased the max memory limit to 64M and I'm still running out of memory.

What is the best way to tackle this problem?

Maybe I should write all my queries to a *.sql file first, then import the *.sql file into the mysql database?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

人心善变 2024-08-12 13:02:50

这可能完全不是您想要的方向，但您可以使用免费的 MySQL 迁移工具包

也许您可以允许用户上传 Access 数据库，然后让您的 PHP 脚本调用迁移工具包？

回复收藏 0 原文

迟月 2024-08-12 13:02:50

如果您打算尝试优化已有的代码，我会尝试聚合插入并看看是否有帮助。这应该很容易添加到您的代码中。像这样的（C# 伪代码）：

int flushCount = 0;

while (!done)
{
    // Build next query, concatenate to last set of queries

    if (++flushCount == 5)
    {
        // Flush queries to database

        // Reset query string to empty

        flushCount = 0;
    }
}

// Flush remaining queries to the database

If you're going to try optimizing the code you have there already, I would try aggregating the INSERTS and see if that helps. This should be easy to add to your code. Something like this (C# pseudocode):

int flushCount = 0;

while (!done)
{
    // Build next query, concatenate to last set of queries

    if (++flushCount == 5)
    {
        // Flush queries to database

        // Reset query string to empty

        flushCount = 0;
    }
}

// Flush remaining queries to the database

回复收藏 0 原文