PHP 5.3、Suhosin 和 UTF-8
我正在努力寻找一个解决方案来继续使用 Suhosin 补丁并使其适用于 UTF-8 表单提交。这是我所做的非常简单的测试:
<?php var_dump($_POST); ?>
<form method="post">
<input name="test" type="text"/>
<input type="submit" />
</form>
使用字符串iñtërnâtiônàlizætiøn。 显然,我首先在服务器上启用 utf-8 标头,并将 Php default_charset 设置为 utf-8,并启用 mb* 覆盖。 一旦我禁用 Suhosin 补丁并重新提交表单,一切都会正常进行。
更新
我做了更多测试来确定:
$test = $_POST['test'];
var_dump(mb_detect_encoding($test, "UTF-8", true));
// Returns true if $string is valid UTF-8 and false otherwise.
function is_utf8($string) {
// From http://w3.org/International/questions/qa-forms-utf-8.html
return preg_match('%^(?:
[\x09\x0A\x0D\x20-\x7E] # ASCII
| [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte
| \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs
| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
| \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates
| \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3
| [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15
| \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16
)*$%xs', $string);
} // function is_utf8
var_dump(is_utf8($test));
启用 Suhosin 补丁后这两个测试都返回 false,否则返回 true。问题是:这是一个错误还是预期的行为? Suhosin 补丁是否有一个配置参数可以对多字节字符串产生一些神奇的作用?
我目前看到的唯一选择是禁用该补丁,除非有聪明的头脑给出正确的建议。
更新 2
GET 字符串不会被损坏并正确显示在浏览器中。目前只有 POST 可以。
I'm struggling to find a solution to keep using the Suhosin patch and make it work with UTF-8 form submissions. This is the very simple test I made:
<?php var_dump($_POST); ?>
<form method="post">
<input name="test" type="text"/>
<input type="submit" />
</form>
using the string iñtërnâtiônàlizætiøn.
Obviously I enable the utf-8 headers on the server first and set the Php default_charset to utf-8 as well as I enabled the mb* override.
As soon as I disable the Suhosin patch and re-submit the form, everything works as it should.
UPDATE
I did more tests just to be sure:
$test = $_POST['test'];
var_dump(mb_detect_encoding($test, "UTF-8", true));
// Returns true if $string is valid UTF-8 and false otherwise.
function is_utf8($string) {
// From http://w3.org/International/questions/qa-forms-utf-8.html
return preg_match('%^(?:
[\x09\x0A\x0D\x20-\x7E] # ASCII
| [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte
| \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs
| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
| \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates
| \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3
| [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15
| \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16
)*$%xs', $string);
} // function is_utf8
var_dump(is_utf8($test));
and both of the test returned false with the Suhosin patch enabled and true otherwise. The question is: is it a bug or is the expected behaviour? is there a configuration parameter for the Suhosin patch that does something magic about the multibyte strings?
The only option I see at this point is disable the patch unless a brilliant mind give the right advice.
UPDATE 2
the GET strings don't get corrupted and are displayed in the browser correctly. Only POST do at the moment.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
从谷歌搜索中,我找到了http://algorytmy.pl/doc/php/ref。 mbstring.php 其中提到
这对我来说并没有多大意义,但它确实提到了 POST 变量,这似乎是问题的症结所在。
我发现,如果我在 Apache 虚拟主机中设置此选项,我可以重现您的问题:
作为参考,这是我用来重现该问题的 php 测试页面:
我尝试注释掉以下 mbstring 设置(或将其关闭)
:似乎解决了这个问题,尽管它对我来说没有多大意义,因为内部字符编码是 utf-8?
我注意到的另一个奇怪之处是,如果我直接在 php.ini(而不是 Apache 虚拟主机)中设置这些 mbstring 值,我无法使用
重现该问题>encoding_translation
所以这似乎只有在使用php_admin_value
时才会出现问题?From a Google search, I found http://algorytmy.pl/doc/php/ref.mbstring.php which mentions
This doesn't really mean much to me, but it does mention POST variables which seems to be the crux of the issue.
I found, if I set this in my Apache virtual host I could reproduce your problem:
For reference, this was the php test page I used to reproduce the issue:
I tried commenting out the following mbstring setting (or turning it off):
This seems to fix the issue, even though it doesn't make much sense to me because the internal character encoding is utf-8??
Another oddness I noticed was that if I set these
mbstring
values directly inphp.ini
(instead of the Apache virtual host), I was unable to reproduce the issue withencoding_translation
so it seems to be a problem only whenphp_admin_value
is used?你尝试过吗?
-> http://www.razorvine.net/test/utf8form/utf8pageform.html
Have you tryed?
-> http://www.razorvine.net/test/utf8form/utf8pageform.html
您是否尝试过在以下 HTML 页面上添加元标记
Did You try in Your meta tags on HTML page following