preg_match 函数中的正则表达式返回浏览器错误

发布于 2024-12-07 20:09:38 字数 780 浏览 4 评论 0原文

以下函数与我在 $pattern 变量中提供的正则表达式中断。如果我更改正则表达式就可以了,所以我认为这就是问题所在。不过,我没有看到这个问题,即使它们已打开,我也没有收到标准 PHP 错误。

function parseAPIResults($results){
//Takes results from getAPIResults, returns array.

    $pattern = '/\[(.|\n)+\]/';
    $resultsArray = preg_match($pattern, $results, $matches);

}

Firefox 6:连接已重置

Chrome 14:错误 101 (net::ERR_CONNECTION_RESET):连接已关闭 重置。

IE 8:Internet Explorer 无法显示网页

更新:
Apache/PHP 可能会崩溃。这是我运行脚本时的 Apache 错误日志:

[Sat Oct 01 11:41:40 2011] [notice] 父进程:子进程退出 状态 255——正在重新启动。
[2011 年 10 月 1 日星期六 11:41:40] [通知] Apache/2.2.11 (Win32) PHP/5.3.0 配置 -- 恢复正常 运营

在 Windows 7 上运行 WAMP 2.0。

The following function breaks with the regexp I've provided in the $pattern variable. If I change the regexp I'm fine, so I think that's the problem. I'm not seeing the problem, though, and I'm not receiving a standard PHP error even though they're turned on.

function parseAPIResults($results){
//Takes results from getAPIResults, returns array.

    $pattern = '/\[(.|\n)+\]/';
    $resultsArray = preg_match($pattern, $results, $matches);

}

Firefox 6: The connection was reset

Chrome 14: Error 101 (net::ERR_CONNECTION_RESET): The connection was
reset.

IE 8: Internet Explorer cannot display the webpage

UPDATE:
Apache/PHP may be crashing. Here's the Apache error log from when I run the script:

[Sat Oct 01 11:41:40 2011] [notice] Parent: child process exited with
status 255 -- Restarting.
[Sat Oct 01 11:41:40 2011] [notice]
Apache/2.2.11 (Win32) PHP/5.3.0 configured -- resuming normal
operations

Running WAMP 2.0 on Windows 7.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

卖梦商人 2024-12-14 20:09:38

简单的问题。复杂的答案!

是的,此类正则表达式会由于堆栈溢出而导致 Apache/PHP 因未处理的分段错误而重复(且默默地)崩溃!

背景:

PHP preg_* 系列正则表达式函数使用 Philip 强大的 PCRE 库榛。在这个库中,有一类正则表达式需要对其内部 match() 函数进行大量递归调用,这会占用大量堆栈空间,(并且所使用的堆栈空间与到匹配的主题字符串的大小)。因此,如果主题字符串太长,就会发生堆栈溢出和相应的分段错误。此行为在 PCRE 文档 标题为:pcrestack

PHP 错误 1:PHP 设置:pcre.recursion_limit 太大。

PCRE 文档描述了如何通过将递归深度限制为大致等于链接应用程序的堆栈大小除以 500 的安全值来避免堆栈溢出分段错误。当按照建议正确限制递归深度时,库不会生成堆栈溢出,而是优雅地退出并显示错误代码。在 PHP 下,最大递归深度是通过 pcre.recursion_limit 配置变量指定的,(不幸的是)默认值设置为 100,000。 这个值太大了!下面是针对各种可执行堆栈大小的 pcre.recursion_limit 安全值表:

Stacksize   pcre.recursion_limit
 64 MB      134217
 32 MB      67108
 16 MB      33554
  8 MB      16777
  4 MB      8388
  2 MB      4194
  1 MB      2097
512 KB      1048
256 KB      524

因此,对于 Apache Web 服务器的 Win32 构建( httpd.exe),其堆栈大小(相对较小)为 256KB,因此 pcre.recursion_limit 的正确值应设置为 524。这可以可以通过以下 PHP 代码行来完成:

ini_set("pcre.recursion_limit", "524"); // PHP default is 100,000.

当将此代码添加到 PHP 脚本中时,不会发生堆栈溢出,而是生成有意义的错误代码。也就是说,它应该生成错误代码! (但不幸的是,由于另一个 PHP 错误,preg_match() 不会。)

PHP 错误 2:preg_match() 出错时不会返回 FALSE。

preg_match() 的 PHP 文档说它在出错时返回 FALSE。不幸的是,PHP 5.3.3 及更低版本有一个错误 (#52732),其中preg_match() 出错时不会返回 FALSE(而是返回 int(0),这与 a 情况下返回的值相同)不匹配)。该错误已在 PHP 5.3.4 版本中修复。

解决方案:

假设您将继续使用 WAMP 2.0(使用 PHP 5.3.0),该解决方案需要考虑上述两个错误。以下是我的建议:

  • 需要将 pcre.recursion_limit 减少到安全值:524。
  • 每当 preg_match() 返回除int(1)
  • 如果preg_match()返回int(1),则匹配成功。
  • 如果 preg_match() 返回 int(0),则匹配要么不成功,要么出现错误。

以下是脚本的修改版本(设计为从命令行运行),用于确定导致递归限制错误的主题字符串长度:

<?php
// This test script is designed to be run from the command line.
// It measures the subject string length that results in a
// PREG_RECURSION_LIMIT_ERROR error in the preg_match() function.

echo("Entering TEST.PHP...\n");

// Set and display pcre.recursion_limit. (set to stacksize / 500).
// Under Win32 httpd.exe has a stack = 256KB and 8MB for php.exe.
//ini_set("pcre.recursion_limit", "524");       // Stacksize = 256KB.
ini_set("pcre.recursion_limit", "16777");   // Stacksize = 8MB.
echo(sprintf("PCRE pcre.recursion_limit is set to %s\n",
    ini_get("pcre.recursion_limit")));

function parseAPIResults($results){
    $pattern = "/\[(.|\n)+\]/";
    $resultsArray = preg_match($pattern, $results, $matches);
    if ($resultsArray === 1) {
        $msg = 'Successful match.';
    } else {
        // Either an unsuccessful match, or a PCRE error occurred.
        $pcre_err = preg_last_error();  // PHP 5.2 and above.
        if ($pcre_err === PREG_NO_ERROR) {
            $msg = 'Successful non-match.';
        } else {
            // preg_match error!
            switch ($pcre_err) {
                case PREG_INTERNAL_ERROR:
                    $msg = 'PREG_INTERNAL_ERROR';
                    break;
                case PREG_BACKTRACK_LIMIT_ERROR:
                    $msg = 'PREG_BACKTRACK_LIMIT_ERROR';
                    break;
                case PREG_RECURSION_LIMIT_ERROR:
                    $msg = 'PREG_RECURSION_LIMIT_ERROR';
                    break;
                case PREG_BAD_UTF8_ERROR:
                    $msg = 'PREG_BAD_UTF8_ERROR';
                    break;
                case PREG_BAD_UTF8_OFFSET_ERROR:
                    $msg = 'PREG_BAD_UTF8_OFFSET_ERROR';
                    break;
                default:
                    $msg = 'Unrecognized PREG error';
                    break;
            }
        }
    }
    return($msg);
}

// Build a matching test string of increasing size.
function buildTestString() {
    static $content = "";
    $content .= "A";
    return '['. $content .']';
}

// Find subject string length that results in error.
for (;;) { // Infinite loop. Break out.
    $str = buildTestString();
    $msg = parseAPIResults($str);
    printf("Length =%10d\r", strlen($str));
    if ($msg !== 'Successful match.') break;
}

echo(sprintf("\nPCRE_ERROR = \"%s\" at subject string length = %d\n",
    $msg, strlen($str)));

echo("Exiting TEST.PHP...");

?>

当您运行此脚本时,它会提供主题当前长度的连续读数细绳。如果将 pcre.recursion_limit 保留为过高的默认值,则可以测量导致可执行文件崩溃的字符串长度。

评论:

  • 在研究这个问题的答案之前,我不知道 PHP 的 bug:当 PCRE 库中发生错误时,preg_match() 无法返回 FALSE。这个错误肯定会引起很多使用 preg_match 的代码的质疑! (我当然会清点我自己的 PHP 代码。)
  • 在 Windows 下,Apache Web 服务器可执行文件 (httpd.exe) 的堆栈大小为 256KB。 PHP 命令行可执行文件 (php.exe) 的堆栈大小为 8MB。 pcre.recursion_limit 的安全值应根据运行脚本的可执行文件进行设置(分别为 524 和 16777)。
  • 在 *nix 系统下,Apache Web 服务器和命令行可执行文件通常都是使用 8MB 的堆栈大小构建的,因此不会经常遇到此问题。
  • PHP 开发人员应将 pcre.recursion_limit 的默认值设置为安全值。
  • PHP 开发人员应将 preg_match() 错误修复应用于 PHP 版本 5.2。
  • 可以使用 CFF Explorer 免费软件程序手动修改 Windows 可执行文件的堆栈大小。您可以使用此程序来增加 Apache httpd.exe 可执行文件的堆栈大小。 (这在 XP 下有效,但 Vista 和 Win7 可能会抱怨。)

Simple question. Complex answer!

Yes, this class of regex will repeatably (and silently) crash Apache/PHP with an unhandled segmentation fault due to a stack overflow!

Background:

The PHP preg_* family of regex functions use the powerful PCRE library by Philip Hazel. With this library, there is a certain class of regex which requires lots of recursive calls to its internal match() function and this uses up a lot of stack space, (and the stack space used is directly proportional to the size of the subject string being matched). Thus, if the subject string is too long, a stack overflow and corresponding segmentation fault will occur. This behavior is described in the PCRE documentation at the end under the section titled: pcrestack.

PHP Bug 1: PHP sets: pcre.recursion_limit too large.

The PCRE documentation describes how to avoid a stack overflow segmentation fault by limiting the recursion depth to a safe value roughly equal to the stack size of the linked application divided by 500. When the recursion depth is properly limited as recommended, the library does not generate a stack overflow and instead gracefully exits with an error code. Under PHP, this maximum recursion depth is specified with the pcre.recursion_limit configuration variable and (unfortunately) the default value is set to 100,000. This value is TOO BIG! Here is a table of safe values of pcre.recursion_limit for a variety of executable stack sizes:

Stacksize   pcre.recursion_limit
 64 MB      134217
 32 MB      67108
 16 MB      33554
  8 MB      16777
  4 MB      8388
  2 MB      4194
  1 MB      2097
512 KB      1048
256 KB      524

Thus, for the Win32 build of the Apache webserver (httpd.exe), which has a (relatively small) stack size of 256KB, the correct value of pcre.recursion_limit should be set to 524. This can be accomplished with the following line of PHP code:

ini_set("pcre.recursion_limit", "524"); // PHP default is 100,000.

When this code is added to the PHP script, the stack overflow does NOT occur, but instead generates a meaningful error code. That is, it SHOULD generate an error code! (But unfortunately, due to another PHP bug, preg_match() does not.)

PHP Bug 2: preg_match() does not return FALSE on error.

The PHP documentation for preg_match() says that it returns FALSE on error. Unfortunately, PHP versions 5.3.3 and below have a bug (#52732) where preg_match() does NOT return FALSE on error (it instead returns int(0), which is the same value returned in the case of a non-match). This bug was fixed in PHP version 5.3.4.

Solution:

Assuming you will continue using WAMP 2.0 (with PHP 5.3.0) the solution needs to take both of the above bugs into consideration. Here is what I would recommend:

  • Need to reduce pcre.recursion_limit to a safe value: 524.
  • Need to explicitly check for a PCRE error whenever preg_match() returns anything other than int(1).
  • If preg_match() returns int(1), then the match was successful.
  • If preg_match() returns int(0), then the match was either not successful, or there was an error.

Here is a modified version of your script (designed to be run from the command line) that determines the subject string length that results in the recursion limit error:

<?php
// This test script is designed to be run from the command line.
// It measures the subject string length that results in a
// PREG_RECURSION_LIMIT_ERROR error in the preg_match() function.

echo("Entering TEST.PHP...\n");

// Set and display pcre.recursion_limit. (set to stacksize / 500).
// Under Win32 httpd.exe has a stack = 256KB and 8MB for php.exe.
//ini_set("pcre.recursion_limit", "524");       // Stacksize = 256KB.
ini_set("pcre.recursion_limit", "16777");   // Stacksize = 8MB.
echo(sprintf("PCRE pcre.recursion_limit is set to %s\n",
    ini_get("pcre.recursion_limit")));

function parseAPIResults($results){
    $pattern = "/\[(.|\n)+\]/";
    $resultsArray = preg_match($pattern, $results, $matches);
    if ($resultsArray === 1) {
        $msg = 'Successful match.';
    } else {
        // Either an unsuccessful match, or a PCRE error occurred.
        $pcre_err = preg_last_error();  // PHP 5.2 and above.
        if ($pcre_err === PREG_NO_ERROR) {
            $msg = 'Successful non-match.';
        } else {
            // preg_match error!
            switch ($pcre_err) {
                case PREG_INTERNAL_ERROR:
                    $msg = 'PREG_INTERNAL_ERROR';
                    break;
                case PREG_BACKTRACK_LIMIT_ERROR:
                    $msg = 'PREG_BACKTRACK_LIMIT_ERROR';
                    break;
                case PREG_RECURSION_LIMIT_ERROR:
                    $msg = 'PREG_RECURSION_LIMIT_ERROR';
                    break;
                case PREG_BAD_UTF8_ERROR:
                    $msg = 'PREG_BAD_UTF8_ERROR';
                    break;
                case PREG_BAD_UTF8_OFFSET_ERROR:
                    $msg = 'PREG_BAD_UTF8_OFFSET_ERROR';
                    break;
                default:
                    $msg = 'Unrecognized PREG error';
                    break;
            }
        }
    }
    return($msg);
}

// Build a matching test string of increasing size.
function buildTestString() {
    static $content = "";
    $content .= "A";
    return '['. $content .']';
}

// Find subject string length that results in error.
for (;;) { // Infinite loop. Break out.
    $str = buildTestString();
    $msg = parseAPIResults($str);
    printf("Length =%10d\r", strlen($str));
    if ($msg !== 'Successful match.') break;
}

echo(sprintf("\nPCRE_ERROR = \"%s\" at subject string length = %d\n",
    $msg, strlen($str)));

echo("Exiting TEST.PHP...");

?>

When you run this script, it provides a continuous readout of the current length of the subject string. If the pcre.recursion_limit is left at its too high default value, this allows you to measure the length of string that causes the executable to crash.

Comments:

  • Before investigating the answer to this question, I didn't know about PHP bug where preg_match() fails to return FALSE when an error occurs in the PCRE library. This bug certainly calls into question a LOT of code that uses preg_match! (I'm certainly going to do an inventory of my own PHP code.)
  • Under Windows, the Apache webserver executable (httpd.exe) is built with a stacksize of 256KB. The PHP command line executable (php.exe) is built with a stacksize of 8MB. The safe value for pcre.recursion_limit should be set in accordance with the executable that the script is being run under (524 and 16777 respectively).
  • Under *nix systems, the Apache webserver and command line executables are both typically built with a stacksize of 8MB, so this problem is not encountered as often.
  • The PHP developers should set the default value of pcre.recursion_limit to a safe value.
  • The PHP developers should apply the preg_match() bugfix to PHP version 5.2.
  • The stacksize of a Windows executable can be manually modified using the CFF Explorer freeware program. You can use this program to increase the stacksize of the Apache httpd.exe executable. (This works under XP but Vista and Win7 might complain.)
不如归去 2024-12-14 20:09:38

我遇到了同样的问题。非常感谢 ridgerunner 发布的答案。

虽然了解 php 崩溃的原因很有帮助,但对我来说这并不能真正解决问题。为了解决这个问题,我需要调整我的正则表达式以节省内存,这样 php 就不会再崩溃了。

所以问题是如何更改正则表达式。上面发布的 PCRE 手册的链接 已经描述了示例正则表达式的解决方案与你的非常相似。

那么如何修复你的正则表达式呢?
首先,你说你想匹配“a . 或换行符”。
注意 ”。”是正则表达式中的特殊字符,它不仅匹配点,还匹配任何字符,因此您需要对其进行转义。 (我希望我没有误解你的意思,这是有意的。)

$pattern = '/\[(\.|\n)+\]/';

接下来,我们可以复制括号内的量词:

$pattern = '/\[(\.+|\n+)+\]/';

这不会改变表达式的含义。现在我们使用所有格量词而不是普通量词:

$pattern = '/\[(\.++|\n++)++\]/';

所以这应该与原始正则表达式具有相同的含义,但可以在 php 中工作而不会使其崩溃。
为什么?所有格量词“吃掉”字符并且不允许回溯。因此,PCRE不必使用递归,堆栈也不会溢出。在括号内使用它们似乎是一个好主意,因为我们不需要经常量化替代方案。

总而言之,最佳实践似乎是:

  • 尽可能使用所有格量词。这意味着:++、*+、?+ {}+ 而不是 +、*、?、{}。
  • 尽可能将量词移到替代括号内

遵循这些规则,我能够解决我自己的问题,我希望这对其他人有帮助。

I ran into the same problem. Thanks a lot for the answer posted by ridgerunner.

Although it is helpful to know why php crashes, for me this does not really solve the problem. To solve the problem, I need to adjust my regex in order to save memory so php won't crash anylonger.

So the question is how to change the regex. The link to the PCRE manual posted above already describes a solution for an example regex that is quite similar to yours.

So how to fix your regex?
First, you say you want to match "a . or a newline".
Note that "." is a special character in a regex that does not only match a dot but any character, so you need to escape that. (I hope I did not get you wrong here and this was intended.)

$pattern = '/\[(\.|\n)+\]/';

Next, we can copy the quantifier inside the brackets:

$pattern = '/\[(\.+|\n+)+\]/';

This does not change the meaning of the expression. Now we use possessive quantifiers instead of normal ones:

$pattern = '/\[(\.++|\n++)++\]/';

So this should have the same meaning as your original regex, but work in php without crashing it.
Why? Possessive quantifiers "eat up" the characters and do not allow to backtrack. Therefore, PCRE does not have to use recursion and stack will not overflow. Using them inside the brackets seems to be a good idea as we do not need the quantification of the alternative this often.

To sum up, best practice seems to be:

  • use possessive quantifiers where possible. This means: ++, *+, ?+ {}+ instead of +, *, ?, {}.
  • move quantifiers inside of alternative-brackets where possible

Following these rules I was able to fix my own problem, and I hope this will help somebody else.

养猫人 2024-12-14 20:09:38

我遇到了同样的问题,您需要将模式更改为类似

$pattern = '|/your pattern/|s';

末尾的“s”基本上意味着将字符串视为单行。

I had the same problem and you need to chenge the pattern to something like

$pattern = '|/your pattern/|s';

The 's' on the end basically means treat the string as a single line.

野生奥特曼 2024-12-14 20:09:38

preg_match 返回找到的模式匹配数。当您有匹配项时,它会在 php 中导致致命错误(例如,print_r(1) 会导致错误)。 print_r(0) (当您更改模式并且没有匹配项时)不会,只会打印出 0。

您想要 print_r($matches)

另外,您的模式没有正确转义。使用双引号意味着您需要转义括号前面的反斜杠。

preg_match returns the number of matches found for the pattern. When you have a match, it is causing a fatal error in php (print_r(1), for instance, causes the error). print_r(0) (for when you change the pattern and have no matches) doesn't and just prints out 0.

You want print_r($matches)

As an aside, your pattern is not escaped properly. Using double quotes means you need to escape the backslashes in front of your brackets.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文