PHP cURL“致命错误:允许的内存大小”对于大数据集

发布于 2024-09-15 20:05:31 字数 2405 浏览 6 评论 0原文

我知道设置内部存储器的选项

ini_set("memory_limit","30M");

,但我想知道是否有更好的方法来查询数据?

我有一个 WHILE 循环来检查是否需要查询另外 1000 条记录。 使用偏移量作为起始记录号和限制作为返回的记录,我搜索与我的数据请求匹配的所有记录。在出现错误之前,我的记录已达到大约 100K。

现在,在测试过程中,我发现出现“致命错误:允许的内存大小...”错误。我已经通过设置上面的 ini_set() 来允许增加内存来阅读,但我想知道是否可以更好地编码?

每次我在 WHILE 循环中执行下面的代码时,内存使用量都会变得非常大。即使我取消设置($curl)。我认为如果我可以在下一个 cURL 查询之前解析出结果后取消设置 $result 和 $curl 变量,那么它可以减少。

function getRequest($url,$user,$pwd) {

    $curl = curl_init();

    curl_setopt($curl, CURLOPT_VERBOSE, 1);
    curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 2);
    curl_setopt($curl, CURLOPT_HEADER, 0);
    curl_setopt($curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_USERPWD, "$user:$pwd");
    curl_setopt($curl, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
    curl_setopt($curl, CURLOPT_ENCODING, '');
    curl_setopt($curl, CURLOPT_URL, $url);

    $result = curl_exec($curl);

    $httpResponseCode = (int)curl_getinfo($curl, CURLINFO_HTTP_CODE);

    switch ($httpResponseCode) {
        case 500:
            // Send problem email
            break;
        case 200:
            // GET was good
            break;
        default:
            // Send problem email
            break;
    }    
    curl_close($curl);
    return $result;
} 

WHILE LOOP(精简版)

while($queryFlag) { // $queryFlag is TRUE

        // Check if we have more records to query, if not set $queryFlag to FALSE

        // Build cURL URL

        echo "Before Call Memory Usage: ".memory_get_usage()."\n";
        $resultXML  = getRequest($query,$user,$pass);
        echo "After Call Memory Usage: ".memory_get_usage()."\n";

        $results        = new ParseXMLConfig((string)$resultXML); // This is basically a class for $this->xml = simplexml_load_string($xml);

        // Loop through results and keep what  I'm looking for
        foreach($results as $resultsKey => $resultsData) {
            if(preg_match('|^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$|i', $resultsData)) {
                $resultsArr["$resultsData"] = $resultsData;
            }
        }

    }

一些内存编号

  • 调用前内存使用量:1819736
  • 调用后内存使用量:2285344
  • 保留数据 我需要
  • 转储数据 我不需要
  • 下一个循环迭代
  • 调用前内存使用量:2084128
  • 调用后内存使用量:2574952

I know about the option to set the internal memory

ini_set("memory_limit","30M");

But I wanted to know if there is a better approach for querying data?

I have a WHILE LOOP that checks to see if I need to query for another 1000 records.
using the offset as the starting record number and the limit as the returned records, I search for all records matching my data request. I hit about 100K in records before I get the error.

Now during testing I found that I get the 'Fatal error: Allowed memory size...' error. I've read by setting the above ini_set() to allow for the increase in memory but I wanted to know if I could just code it better?

Each time I execute the code below in the WHILE LOOP, the memory usage grows very large. Even if I unset($curl). I think it could be reduced if I could unset the $result and $curl variables after I have parsed out the results before the next cURL query.

function getRequest($url,$user,$pwd) {

    $curl = curl_init();

    curl_setopt($curl, CURLOPT_VERBOSE, 1);
    curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 2);
    curl_setopt($curl, CURLOPT_HEADER, 0);
    curl_setopt($curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_USERPWD, "$user:$pwd");
    curl_setopt($curl, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
    curl_setopt($curl, CURLOPT_ENCODING, '');
    curl_setopt($curl, CURLOPT_URL, $url);

    $result = curl_exec($curl);

    $httpResponseCode = (int)curl_getinfo($curl, CURLINFO_HTTP_CODE);

    switch ($httpResponseCode) {
        case 500:
            // Send problem email
            break;
        case 200:
            // GET was good
            break;
        default:
            // Send problem email
            break;
    }    
    curl_close($curl);
    return $result;
} 

WHILE LOOP (Slim version)

while($queryFlag) { // $queryFlag is TRUE

        // Check if we have more records to query, if not set $queryFlag to FALSE

        // Build cURL URL

        echo "Before Call Memory Usage: ".memory_get_usage()."\n";
        $resultXML  = getRequest($query,$user,$pass);
        echo "After Call Memory Usage: ".memory_get_usage()."\n";

        $results        = new ParseXMLConfig((string)$resultXML); // This is basically a class for $this->xml = simplexml_load_string($xml);

        // Loop through results and keep what  I'm looking for
        foreach($results as $resultsKey => $resultsData) {
            if(preg_match('|^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$|i', $resultsData)) {
                $resultsArr["$resultsData"] = $resultsData;
            }
        }

    }

Some memory numbers

  • Before Call Memory Usage: 1819736
  • After Call Memory Usage: 2285344
  • keep data I need
  • dump data I don't need
  • Next LOOP Iteration
  • Before Call Memory Usage: 2084128
  • After Call Memory Usage: 2574952

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

So尛奶瓶 2024-09-22 20:05:31

我猜您对 $resultsArr 使用了不正确的密钥。您使用相同的字符串作为键和值。

尝试更改

$resultsArr["$resultsData"] = $resultsData

$resultsArr[$resultsKey] = $resultsData

I guess you are using incorrect key for $resultsArr. You are using same string as both key and value.

Try changing

$resultsArr["$resultsData"] = $resultsData

to

$resultsArr[$resultsKey] = $resultsData
掩于岁月 2024-09-22 20:05:31

定居于:

ini_set("memory_limit","30M");

Settled for:

ini_set("memory_limit","30M");
自演自醉 2024-09-22 20:05:31

这是对OP问题的回应,“我可以写得更好吗?”。你问对了问题!干得好。

我遇到了与此类似的问题,其中有一个循环,并且循环中的某些内容正在消耗内存。我发现,我认为,因为我将想要“保留”的数据保存到对象(stdClass)中,所以这就是减慢速度的原因。我改变了它,而不是像这样:

$payloadObject->someNewAttribute = $data_i_want_to_keep;

我使用了索引数组:

$payload_array[] = $data_i_want_to_keep;

这对我来说很有效。

我在某处看到了一个测试(我自己尝试了一下,发现结果与我所看到的一致),用于获取/设置 stdClass 对象、关联数组和索引数组。我忘记了它是速度还是内存,但是设置 stdClass 使用更多内存/比设置关联数组键/值更慢,这比仅添加新的索引数组项更慢/使用更多内存。即:

//  slowest / most memory intensive
$some_stdClass_object->some_new_attribute = $data_I_want_to_keep;

//  slower / still memory intensive
$some_array["some_new_key"] = $data_I_want_to_keep;

//  optimal / fastest
$some_array[] = $data_I_want_to_keep;

在再次查看您的代码后,我注意到这一行(为了便于阅读而移动了注释):

// This is basically a class for $this->xml = simplexml_load_string($xml);
$results        = new ParseXMLConfig((string)$resultXML);

看起来您正在将 xml 设置为类上的属性(无论类 $this 是什么)。这很可能就是消耗你记忆力的原因。也许查看该函数(ParseXMLConfig),看看是否可以将该数据保存到对象中,如果需要,可以将其保存在索引数组或关联数组中。

This is in response to OP's question, "can I just code better?". You're asking the right question! Good on ya.

I had an issue similar to this, where I had a loop and something in the loop was chewing up memory. What I discovered was that, I think, because I was saving the data I wanted to 'keep' to an object (stdClass), that's what was slowing things down. I changed it so instead of something like:

$payloadObject->someNewAttribute = $data_i_want_to_keep;

I used an indexed array:

$payload_array[] = $data_i_want_to_keep;

That worked out for me.

I saw a test somewhere (and tried it out myself and found my results in agreement with what I saw) for getting / setting on a stdClass object, an associative array, and an indexed array. I forget if it was speed or memory, but setting on stdClass uses more memory / is slower than setting an associative array key / value, which is slower / uses more memory than just adding a new indexed array item. I.e.:

//  slowest / most memory intensive
$some_stdClass_object->some_new_attribute = $data_I_want_to_keep;

//  slower / still memory intensive
$some_array["some_new_key"] = $data_I_want_to_keep;

//  optimal / fastest
$some_array[] = $data_I_want_to_keep;

I noticed this line, after looking your code over again (comment moved for readability):

// This is basically a class for $this->xml = simplexml_load_string($xml);
$results        = new ParseXMLConfig((string)$resultXML);

It looks like you're setting the xml to an attribute on a class (whatever class $this is). That could very well be what's eating your memory. Maybe look at that function (ParseXMLConfig) and see if you can change saving that data to the object, to saving it in maybe an indexed array or an associative array if you need.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文