当前位置：文江博客话题详情

处理非常大的 csv 文件，没有超时和内存错误

发布于 2024-12-03 01:39:54 字数 217 浏览 3 评论 0原文

目前我正在为一个非常大的 CSV 文件编写导入脚本。问题是大多数时候它会在一段时间后由于超时而停止或引发内存错误。

我的想法现在是在“100 行”步骤中解析 CSV 文件，并在 100 行后自动调用脚本。我尝试使用 header (location ...) 来实现此目的，并使用 get 传递当前行，但它没有按照我想要的方式工作。

有没有更好的方法，或者有人知道如何消除内存错误和超时？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

妖妓 2024-12-10 01:39:54

我使用 fgetcsv 来阅读一个 120MB 的 csv 以流式方式（这是正确的英语吗？）。它逐行读取，然后我将每一行插入数据库。这样每次迭代时内存中只保存一行。剧本还需要20分钟。运行。也许我下次会尝试使用Python…不要尝试将巨大的csv文件加载到数组中，这确实会消耗大量内存。

// WDI_GDF_Data.csv (120.4MB) are the World Bank collection of development indicators:
// http://data.worldbank.org/data-catalog/world-development-indicators
if(($handle = fopen('WDI_GDF_Data.csv', 'r')) !== false)
{
    // get the first row, which contains the column-titles (if necessary)
    $header = fgetcsv($handle);
    
    // loop through the file line-by-line
    while(($data = fgetcsv($handle)) !== false)
    {
        // resort/rewrite data and insert into DB here
        // try to use conditions sparingly here, as those will cause slow-performance
                
        // I don't know if this is really necessary, but it couldn't harm;
        // see also: http://php.net/manual/en/features.gc.php
        unset($data);
    }
    fclose($handle);
}

I've used fgetcsv to read a 120MB csv in a stream-wise-manner (is that correct english?). That reads in line by line and then I've inserted every line into a database. That way only one line is hold in memory on each iteration. The script still needed 20 min. to run. Maybe I try Python next time… Don't try to load a huge csv-file into an array, that really would consume a lot of memory.

// WDI_GDF_Data.csv (120.4MB) are the World Bank collection of development indicators:
// http://data.worldbank.org/data-catalog/world-development-indicators
if(($handle = fopen('WDI_GDF_Data.csv', 'r')) !== false)
{
    // get the first row, which contains the column-titles (if necessary)
    $header = fgetcsv($handle);
    
    // loop through the file line-by-line
    while(($data = fgetcsv($handle)) !== false)
    {
        // resort/rewrite data and insert into DB here
        // try to use conditions sparingly here, as those will cause slow-performance
                
        // I don't know if this is really necessary, but it couldn't harm;
        // see also: http://php.net/manual/en/features.gc.php
        unset($data);
    }
    fclose($handle);
}

回复收藏 0 原文

苄①跕圉湢 2024-12-10 01:39:54

我发现上传文件并使用 mysql 的 LOAD DATA LOCAL 查询插入是一个快速的解决方案，例如：

    $sql = "LOAD DATA LOCAL INFILE '/path/to/file.csv' 
        REPLACE INTO TABLE table_name FIELDS TERMINATED BY ',' 
        ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
    $result = $mysqli->query($sql);

I find uploading the file and inserting using mysql's LOAD DATA LOCAL query a fast solution eg:

    $sql = "LOAD DATA LOCAL INFILE '/path/to/file.csv' 
        REPLACE INTO TABLE table_name FIELDS TERMINATED BY ',' 
        ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
    $result = $mysqli->query($sql);

回复收藏 0 原文

落叶缤纷 2024-12-10 01:39:54

如果您不关心它需要多长时间以及需要多少内存，您可以简单地增加此脚本的值。只需将以下行添加到脚本顶部：

ini_set('memory_limit', '512M');
ini_set('max_execution_time', '180');

使用函数 memory_get_usage() 您可以找出脚本需要多少内存才能找到合适的 memory_limit 值。

您可能还想看看 fgets() 它允许您读取文件行按行。我不确定这是否需要更少的内存，但我真的认为这会起作用。但即使在这种情况下，您也必须将 max_execution_time 增加到更高的值。

If you don't care about how long it takes and how much memory it needs, you can simply increase the values for this script. Just add the following lines to the top of your script:

ini_set('memory_limit', '512M');
ini_set('max_execution_time', '180');

With the function memory_get_usage() you can find out how much memory your script needs to find a good value for the memory_limit.

You might also want to have a look at fgets() which allows you to read a file line by line. I am not sure if that takes less memory, but I really think this will work. But even in this case you have to increase the max_execution_time to a higher value.

回复收藏 0 原文