批处理 php 的 fgetcsv

发布于 2024-10-10 11:06:03 字数 261 浏览 6 评论 0原文

我有一个相当大的 csv 文件(至少对于网络而言),我无法控制。它有大约 100k 行,并且只会变得更大。

我使用 Drupal Module Feeds 基于此数据创建节点,并且它们的解析器以 50 行为一组进行批处理解析。然而,他们的解析器无法正确处理引号,并且无法解析大约 60% 的 csv 文件。 fgetcsv 可以工作,但据我所知不能批量处理。

当尝试使用 fgetcsv 读取整个文件时,PHP 最终耗尽了内存。因此,我希望能够将事物分解成更小的块。这可能吗?

I have a fairly large csv file (at least for the web) that I don't have control of. It has about 100k rows in it, and will only grow larger.

I'm using the Drupal Module Feeds to create nodes based on this data, and their parser batches the parsing in groups of 50 lines. However, their parser doesn't handle quotation marks properly, and fails to parse about 60% of the csv file. fgetcsv works but doesn't batch things as far as I can tell.

While trying to read the entire file with fgetcsv, PHP eventually runs out of memory. Therefore I would like to be able to break things up into smaller chunks. Is this possible?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

可爱暴击 2024-10-17 11:06:03

fgetcsv() 的工作原理是从给定的文件指针一次读取一行。如果 PHP 内存不足,也许您正在尝试立即解析整个文件,将其全部放入一个巨大的数组中。解决方案是逐行处理它,而不将其存储在大数组中。

要更直接地回答批处理问题,请从文件中读取 n 行,然后使用 ftell() 查找文件中结束的位置。记下这一点,然后您可以在将来的某个时刻通过在 fgetcsv() 之前调用 fseek() 返回到这一点。

fgetcsv() works by reading one line at a time from a given file pointer. If PHP is running out of memory, perhaps you are trying to parse the whole file at once, putting it all into a giant array. The solution would be to process it line by line without storing it in a big array.

To answer the batching question more directly, read n lines from the file, then use ftell() to find the location in the file where you ended. Make a note of this point, and then you can return to it at some point in the future by calling fseek() before fgetcsv().

怪我入戏太深 2024-10-17 11:06:03

好吧,创建一个函数来解析一堆行:

function parseLines(array $lines) {
    foreach ($lines as $line) {
        //insert line into new node
    }
}

然后,只需将其批处理:

$numberOfLinesToBatch = 50;
$f = fopen($file, 'r');
if (!$f) die('implement better error checking');

$buffer = array();
while ($row = fgetcsv($f)) {
    $buffer[] = $row;
    if (count($buffer) >= $numberOfLinesToBatch) {
        parseLines($buffer);
        $buffer = array();
    }
}
if (!empty($buffer)) {
    parseLines(buffer);
}

fclose($f);

它将数据流入,您可以通过调整变量来调整它缓冲的行数......

Well, create a function to parse a bunch of lines:

function parseLines(array $lines) {
    foreach ($lines as $line) {
        //insert line into new node
    }
}

Then, just batch it up:

$numberOfLinesToBatch = 50;
$f = fopen($file, 'r');
if (!$f) die('implement better error checking');

$buffer = array();
while ($row = fgetcsv($f)) {
    $buffer[] = $row;
    if (count($buffer) >= $numberOfLinesToBatch) {
        parseLines($buffer);
        $buffer = array();
    }
}
if (!empty($buffer)) {
    parseLines(buffer);
}

fclose($f);

It streams the data in, and you can tune how many rows it buffers by tweaking the varariable...

国产ˉ祖宗 2024-10-17 11:06:03

我怀疑问题在于您在内存中存储了太多信息,而不是如何从磁盘读取 CSV 文件。 (即:fgetcsv一次只会读取一行,因此,如果单行数据导致您耗尽内存,那么您就会遇到麻烦。)

因此,您只需要使用一种方法:

  1. 读取将“x”行放入数组中。
  2. 处理此信息
  3. 清除所有临时变量/数组。
  4. 重复直到 FEOF。

或者,您可以通过 PHP 的命令行版本执行 CSV 处理,并使用具有更大内存限制的自定义 php.ini。

I suspect the problem is the fact that you're storing too much information in memory rather than how you're reading the CSV file off disk. (i.e.: fgetcsv will only read a line at a time, so if a single line's worth of data is causing you to run out of memory you're in trouble.)

As such, you simply need to use an approach where you:

  1. Read 'x' lines into an array.
  2. Process this information
  3. Clear any temporary variables/arrays.
  4. Repeat until FEOF.

Alternatively, you could execute the CSV processing via the command line version of PHP and use a custom php.ini that has a much larger memory limit.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文