PHP 的 preg-match_all 导致 Apache Segfault
我使用两个正则表达式从 MySQL 查询中提取分配,并使用它们创建审计跟踪。其中之一是“挑剔”的,需要引用列名/等,而另一种则不需要。
它们都经过测试并正确解析出值。我遇到的问题是,对于某些查询,“挑剔”正则表达式实际上只是导致 Apache 出现段错误。
我尝试了多种方法来确定这是将正则表达式留在代码中的原因,然后修改条件以确保它不运行(以排除某种编译时问题或其他问题)。没有问题。只有当它针对特定查询运行正则表达式时才会出现段错误,并且我找不到任何明显的模式来告诉我原因。
有问题的代码:
if ($picky)
preg_match_all("/[`'\"]((?:[A-Z]|[a-z]|_|[0-9])+)[`'\"] *= *'((?:[^'\\\\]|\\\\.)*)'/", $sql, $matches);
else
preg_match_all("/[`'\"]?((?:[A-Z]|[a-z]|_|[0-9])+)[`'\"]? *= *[`'\"]?([^`'\" ,]+)[`'\"]?/", $sql, $matches);
两者之间的唯一区别是,第一个删除了引号上的问号,使它们成为非可选,并删除了在值上使用不同类型的引号的选项 - 只允许单引号。将第一个正则表达式替换为第二个正则表达式(出于测试目的)并使用相同的数据可以消除该问题 - 这肯定与正则表达式有关。
导致我悲伤的具体 SQL 可在以下位置找到:
http://stackoverflow.pastebin.com/m75c2a2a0
有趣的是,当我删除突出显示的部分时,它一切正常。尝试单独提交突出显示的部分不会导致错误。
我对这里发生的事情感到非常困惑。任何人都可以提供有关进一步调试或修复的任何建议吗?
编辑:没什么特别令人兴奋的,但为了完整起见,这里是来自 Apache 的相关日志条目(/var/log/apache2/error.log - 站点的 error.log 中没有任何内容。甚至没有提及访问中的请求日志。)
[Thu Dec 10 10:08:03 2009] [notice] child pid 20835 exit signal Segmentation fault (11)
其中一个针对包含该查询的每个请求。
编辑2:根据黑木风的建议,我尝试了相同长度的胡言乱语并得到了相同的段错误。坐下来尝试了一堆不同的长度并找到了极限。 6035 个字符就可以了。 6036 段错误。
EDIT3:更改 php.ini 中的 pcre.backtrack_limit 和 pcre.recursion_limit 的值在一定程度上缓解了该问题。 Apache 不再出现段错误,但我的正则表达式不再匹配字符串中的所有匹配项。显然这是 PHP/PCRE 中一个长期已知的错误(从 2007 年开始):
http://bugs.php.net/bug.php?id=40909
EDIT4:我在下面的答案中发布了代码,我用来替换这个特定的正则表达式,因为变通方法对于我的目的来说是不可接受的(待售产品,不能保证 php.ini 更改,并且正则表达式仅部分工作已删除我们需要的功能)。我发布的代码已发布到公共领域,没有任何形式的保证或支持。我希望它可以帮助别人。 :)
谢谢大家的帮助!
亚当
I'm using two regular expressions to pull assignments out of MySQL queries and using them to create an audit trail. One of them is the 'picky' one that requires quoted column names/etc., the other one does not.
Both of them are tested and parse the values out correctly. The issue I'm having is that with certain queries the 'picky' regexp is actually just causing Apache to segfault.
I tried a variety of things to determine this was the cause up to leaving the regexp in the code, and just modifying the conditional to ensure it wasn't run (to rule out some sort of compile-time issue or something). No issues. It's only when it runs the regexp against specific queries that it segfaults, and I can't find any obvious pattern to tell me why.
The code in question:
if ($picky)
preg_match_all("/[`'\"]((?:[A-Z]|[a-z]|_|[0-9])+)[`'\"] *= *'((?:[^'\\\\]|\\\\.)*)'/", $sql, $matches);
else
preg_match_all("/[`'\"]?((?:[A-Z]|[a-z]|_|[0-9])+)[`'\"]? *= *[`'\"]?([^`'\" ,]+)[`'\"]?/", $sql, $matches);
The only difference between the two is that the first one removes the question marks on the quotes to make them non-optional and removes the option of using different kinds of quotes on the value - only allows single quotes. Replacing the first regexp with the second (for testing purposes) and using the same data removes the issue - it is definitely something to do with the regexp.
The specific SQL that is causing me grief is available at:
http://stackoverflow.pastebin.com/m75c2a2a0
Interestingly enough, when I remove the highlighted section, it all works fine. Trying to submit the highlighted section by itself causes no error.
I'm pretty perplexed as to what's going on here. Can anyone offer any suggestions as to further debugging or a fix?
EDIT: Nothing terribly exciting, but for the sake of completeness here's the relevant log entry from Apache (/var/log/apache2/error.log - There's nothing in the site's error.log. Not even a mention of the request in the access log.)
[Thu Dec 10 10:08:03 2009] [notice] child pid 20835 exit signal Segmentation fault (11)
One of these for each request containing that query.
EDIT2: On the suggestion of Kuroki Kaze, I tried gibberish of the same length and got the same segfault. Sat and tried a bunch of different lengths and found the limit. 6035 characters works fine. 6036 segfaults.
EDIT3: Changing the values of pcre.backtrack_limit
and pcre.recursion_limit
in php.ini
mitigated the problem somewhat. Apache no longer segfaults, but my regexp no longer matches all of the matches in the string. Apparently this is a long-known (from 2007) bug in PHP/PCRE:
http://bugs.php.net/bug.php?id=40909
EDIT4: I posted the code in the answers below that I used to replace this specific regular expression as the workarounds weren't acceptable for my purpose (product for sale, can't guarantee php.ini changes and the regexp only partially working removed functionality we require). Code I posted is released into the public domain with no warranty or support of any kind. I hope it can help someone else. :)
Thank you everyone for the help!
Adam
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我遇到了类似的 preg_match 相关问题,同样的 Apache segfault。只有导致它的 preg_match 内置于我正在使用的 CMS (WordPress) 中。
提供的“解决方法”是更改 php.ini 中的这些设置:
[Pcre]
;PCRE 库回溯限制。
;pcre.backtrack_limit=100000
PCRE.recursion_limit=200000000
pcre.backtrack_limit=100000000
权衡是渲染更大的页面(在我的例子中,> 200 行;当其中一列限制为 1500 个字符的文本描述时),您将获得相当高的 CPU 利用率,我仍然看到段错误。只是不那么频繁。
我的网站即将报废,因此我实际上没有太多需要(或预算)来寻找真正的解决方案。但也许这可以缓解您所看到的问题。
I have been hit with a similar preg_match-related issue, same Apache segfault. Only the preg_match that causes it is built-into the CMS I'm using (WordPress).
The "workaround" that was offered was to change these settings in php.ini:
[Pcre]
;PCRE library backtracking limit.
;pcre.backtrack_limit=100000
pcre.recursion_limit=200000000
pcre.backtrack_limit=100000000
The trade-off is for rendering larger pages, (in my case, > 200 rows; when one of the columns is limited to a 1500-character text description), you'll get pretty high CPU utilization, and I'm still seeing the segfaults. Just not as frequently.
My site's close to end-of-life, so I don't really have much need (or budget) to look for a real solution. But maybe this can mitigate the issue you're seeing.
提交的大小如何?如果你传递同样长度的乱码,会发生什么?
编辑:拆分和合并看起来像这样:
What about size of the submission? If you pass gibberish of equal length, what will happen?
EDIT: splitting and merging will look something like this:
鉴于这只需要在保存页面或执行其他不经常执行的操作时与查询进行匹配,我认为以下代码的性能影响是可以接受的。它解析 SQL 查询 (
$sql
) 并将 name=>value 对放入$data
中。似乎运行良好并且可以很好地处理大型查询。谢谢大家的帮助和指导!
Given that this only needs to match against the queries when saving pages or performing other not very often-executed operations, I felt the performance hit of the following code was acceptable. It parses the SQL query (
$sql
) and places name=>value pairs into$data
. Seems to be working well and handles large queries fine.Thank you everyone for the help and direction!