PHP 5.3、Suhosin 和 UTF-8

发布于 2024-12-08 18:40:26 字数 1498 浏览 0 评论 0原文

我正在努力寻找一个解决方案来继续使用 Suhosin 补丁并使其适用于 UTF-8 表单提交。这是我所做的非常简单的测试:

<?php var_dump($_POST); ?>
<form method="post">
    <input name="test" type="text"/>
    <input type="submit" />
</form>

使用字符串iñtërnâtiônàlizætiøn。 显然,我首先在服务器上启用 utf-8 标头,并将 Php default_charset 设置为 utf-8,并启用 mb* 覆盖。 一旦我禁用 Suhosin 补丁并重新提交表单,一切都会正常进行。

更新

我做了更多测试来确定:

$test = $_POST['test'];

var_dump(mb_detect_encoding($test, "UTF-8", true));

// Returns true if $string is valid UTF-8 and false otherwise.
function is_utf8($string) {

    // From http://w3.org/International/questions/qa-forms-utf-8.html
    return preg_match('%^(?:
      [\x09\x0A\x0D\x20-\x7E]            # ASCII
    | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
    |  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
    | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
    |  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
    |  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
    | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
    |  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
    )*$%xs', $string);

} // function is_utf8
var_dump(is_utf8($test));

启用 Suhosin 补丁后这两个测试都返回 false,否则返回 true。问题是:这是一个错误还是预期的行为? Suhosin 补丁是否有一个配置参数可以对多字节字符串产生一些神奇的作用?

我目前看到的唯一选择是禁用该补丁,除非有聪明的头脑给出正确的建议。

更新 2

GET 字符串不会被损坏并正确显示在浏览器中。目前只有 POST 可以。

I'm struggling to find a solution to keep using the Suhosin patch and make it work with UTF-8 form submissions. This is the very simple test I made:

<?php var_dump($_POST); ?>
<form method="post">
    <input name="test" type="text"/>
    <input type="submit" />
</form>

using the string iñtërnâtiônàlizætiøn.
Obviously I enable the utf-8 headers on the server first and set the Php default_charset to utf-8 as well as I enabled the mb* override.
As soon as I disable the Suhosin patch and re-submit the form, everything works as it should.

UPDATE

I did more tests just to be sure:

$test = $_POST['test'];

var_dump(mb_detect_encoding($test, "UTF-8", true));

// Returns true if $string is valid UTF-8 and false otherwise.
function is_utf8($string) {

    // From http://w3.org/International/questions/qa-forms-utf-8.html
    return preg_match('%^(?:
      [\x09\x0A\x0D\x20-\x7E]            # ASCII
    | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
    |  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
    | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
    |  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
    |  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
    | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
    |  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
    )*$%xs', $string);

} // function is_utf8
var_dump(is_utf8($test));

and both of the test returned false with the Suhosin patch enabled and true otherwise. The question is: is it a bug or is the expected behaviour? is there a configuration parameter for the Suhosin patch that does something magic about the multibyte strings?

The only option I see at this point is disable the patch unless a brilliant mind give the right advice.

UPDATE 2

the GET strings don't get corrupted and are displayed in the browser correctly. Only POST do at the moment.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

待"谢繁草 2024-12-15 18:40:26

从谷歌搜索中,我找到了http://algorytmy.pl/doc/php/ref。 mbstring.php 其中提到

从 PHP 4.3.3 开始,如果 HTML 表单的 enctype 设置为 multipart/form-data 并且 mbstring.encoding_translationphp.ini 中设置为 On code> POST 后的变量和上传文件的名称也将转换为内部字符编码。但是,转换不会应用于查询键。

这对我来说并没有多大意义,但它确实提到了 POST 变量,这似乎是问题的症结所在。

我发现,如果我在 Apache 虚拟主机中设置此选项,我可以重现您的问题:

php_admin_value mbstring.language       "Neutral"
php_admin_value mbstring.encoding_translation   "On"
php_admin_value mbstring.http_input     "UTF-8"
php_admin_value mbstring.http_output    "UTF-8"
php_admin_value mbstring.detect_order   "auto"
php_admin_value mbstring.substitute_character   "none"
php_admin_value mbstring.internal_encoding "UTF-8"
php_admin_value mbstring.func_overload "7"
php_admin_value default_charset "UTF-8"

作为参考,这是我用来重现该问题的 php 测试页面:

<!DOCTYPE html>
<html>
<head>
</head>
<body>
<pre><?php echo $_POST['test'];?></pre>
<form method="post">
    <input name="test" type="text"/>
    <input type="submit" />
</form>
Test string to use: iñtërnâtiônàlizætiøn
</body>
</html>

我尝试注释掉以下 mbstring 设置(或将其关闭)

; Disable HTTP Input conversion (PHP 4.3.0 or higher)
mbstring.encoding_translation = Off

:似乎解决了这个问题,尽管它对我来说没有多大意义,因为内部字符编码 utf-8?

我注意到的另一个奇怪之处是,如果我直接在 php.ini(而不是 Apache 虚拟主机)中设置这些 mbstring 值,我无法使用 重现该问题>encoding_translation 所以这似乎只有在使用 php_admin_value 时才会出现问题?

From a Google search, I found http://algorytmy.pl/doc/php/ref.mbstring.php which mentions

Beginning with PHP 4.3.3, if enctype for HTML form is set to multipart/form-data and mbstring.encoding_translation is set to On in php.ini the POST'ed variables and the names of uploaded files will be converted to the internal character encoding as well. However, the conversion isn't applied to the query keys.

This doesn't really mean much to me, but it does mention POST variables which seems to be the crux of the issue.

I found, if I set this in my Apache virtual host I could reproduce your problem:

php_admin_value mbstring.language       "Neutral"
php_admin_value mbstring.encoding_translation   "On"
php_admin_value mbstring.http_input     "UTF-8"
php_admin_value mbstring.http_output    "UTF-8"
php_admin_value mbstring.detect_order   "auto"
php_admin_value mbstring.substitute_character   "none"
php_admin_value mbstring.internal_encoding "UTF-8"
php_admin_value mbstring.func_overload "7"
php_admin_value default_charset "UTF-8"

For reference, this was the php test page I used to reproduce the issue:

<!DOCTYPE html>
<html>
<head>
</head>
<body>
<pre><?php echo $_POST['test'];?></pre>
<form method="post">
    <input name="test" type="text"/>
    <input type="submit" />
</form>
Test string to use: iñtërnâtiônàlizætiøn
</body>
</html>

I tried commenting out the following mbstring setting (or turning it off):

; Disable HTTP Input conversion (PHP 4.3.0 or higher)
mbstring.encoding_translation = Off

This seems to fix the issue, even though it doesn't make much sense to me because the internal character encoding is utf-8??

Another oddness I noticed was that if I set these mbstring values directly in php.ini (instead of the Apache virtual host), I was unable to reproduce the issue with encoding_translation so it seems to be a problem only when php_admin_value is used?

无声无音无过去 2024-12-15 18:40:26

你尝试过吗?

<form accept-charset="UTF-8" method="post">

-> http://www.razorvine.net/test/utf8form/utf8pageform.html

Have you tryed?

<form accept-charset="UTF-8" method="post">

-> http://www.razorvine.net/test/utf8form/utf8pageform.html

何时共饮酒 2024-12-15 18:40:26

您是否尝试过在以下 HTML 页面上添加元标记

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" ></meta>

Did You try in Your meta tags on HTML page following

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" ></meta>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文