当前位置：文江博客话题详情

php/mysql CMS 的多语言用户输入前后处理的最佳方法

发布于 2024-07-12 11:23:52 字数 1261 浏览 9 评论 0原文

好吧，关于清理字符串有很多东西，但我能找到的关于准备用户输入（就像我现在输入的内容）以插入内容管理系统以及如何过滤的最佳方法的东西却很少它出来了。

我正在构建两个多语言（日语、英语+其他罗曼语系语言）CMS，并且在将 ®、™ 等特殊字符与日语字符一起显示时遇到了麻烦。

我继续得到非常不一致的结果。

我将所有内容设置为 UTF-8：

网页：和

.htaccess 文件：AddDefaultCharset UTF-8 AND（强制解决问题）

在每个数据库连接之后： mysql_query("SET NAMES 'UTF8'");

每个数据库、表和字段也设置为 utf8_general_ci

魔术引号关闭。我首先使用 htmlpurifier 的默认设置预处理用户输入，然后对其运行此函数：

function html_encode($var) {

        // Encodes HTML safely for UTF-8. Use instead of htmlentities.
        $var = htmlentities($var, ENT_QUOTES, 'UTF-8');

        // convert pesky special characters to unicode
        $look = array('™', '&trade;','®','&reg;');
        $safe = array('&#8482;', '&#8482;', '&#174;', '&#174;'); 

        $var = str_replace($look, $safe, $var);

        $var = mysql_real_escape_string($var); 

        return $var; 
                            }

将其输入数据库。

我通过使用此函数过滤所有内容来从数据库返回它：

function decodeit($var) {

        return html_entity_decode(stripcslashes($var), ENT_QUOTES, 'UTF-8');
                            }

不幸的是，毕竟我仍然得到不一致的结果。大多数情况下，® 符号会变成小菱形。

我已经到处寻找对此的好方法，但似乎找不到最好的方法......

原文

Okay, there is a ton of stuff out there on sanitizing strings but very little, that I can find, on the best methods to prepare user input (like what I'm typing now) for inserting into a content management system then how to filter it coming out.

I'm building two multilingual (Japanese, English + other Romance languages) CMSs and having a heck of a time with getting both special characters like ®, ™, to display along with Japanese characters.

I continue to get very inconsistent results.

I have everything set to UTF-8:

web page: and

.htaccess file: AddDefaultCharset UTF-8 AND (to force the issue)

after each db connection: mysql_query("SET NAMES 'UTF8'");

each database, table, and field is also set to utf8_general_ci

Magic quotes are off. I preprocess user input first with the default settings of htmlpurifier, then I run this function on it:

function html_encode($var) {

        // Encodes HTML safely for UTF-8. Use instead of htmlentities.
        $var = htmlentities($var, ENT_QUOTES, 'UTF-8');

        // convert pesky special characters to unicode
        $look = array('™', '™','®','®');
        $safe = array('™', '™', '®', '®'); 

        $var = str_replace($look, $safe, $var);

        $var = mysql_real_escape_string($var); 

        return $var; 
                            }

That get's it in to the database.

I return it from the database by filtering everything with this function:

function decodeit($var) {

        return html_entity_decode(stripcslashes($var), ENT_QUOTES, 'UTF-8');
                            }

Unfortunately, after all this I STILL get inconsistent results. Most often the ® symbols become little diamonds.

I've searched all over for a good tut on this but can't seem to find what are the best methods...

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

把时间冻结 2024-07-19 11:23:52

抱歉，网页标题已被所见即所得编辑器删除。为了清楚起见：

网页标题是：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

并且

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Sorry the web page headers got scrubbed by the wysiwyg editor. For clarity's sake:

Web page headers are:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

And

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

回复收藏 0 原文

笔落惊风雨 2024-07-19 11:23:52

不要将 htmlentities 放入数据库中！永远不要调用 html_entities()，它应该从 php.ini 中弃用。使用 htmlspecialchars 但在显示文本时，而不是在将其放入数据库之前。重点是防止您的数据被视为 html。翻译商标符号或版权符号是没有意义的，因为它们不会造成风险。您唯一需要担心的 html 是：> < & ’”

回复收藏 0 原文