php/mysql CMS 的多语言用户输入前后处理的最佳方法

发布于 2024-07-12 11:23:52 字数 1261 浏览 9 评论 0原文

好吧,关于清理字符串有很多东西,但我能找到的关于准备用户输入(就像我现在输入的内容)以插入内容管理系统以及如何过滤的最佳方法的东西却很少它出来了。

我正在构建两个多语言(日语、英语+其他罗曼语系语言)CMS,并且在将 ®、™ 等特殊字符与日语字符一起显示时遇到了麻烦。

我继续得到非常不一致的结果。

我将所有内容设置为 UTF-8:

网页:和

.htaccess 文件:AddDefaultCharset UTF-8 AND(强制解决问题)

在每个数据库连接之后: mysql_query("SET NAMES 'UTF8'");

每个数据库、表和字段也设置为 utf8_general_ci

魔术引号关闭。 我首先使用 htmlpurifier 的默认设置预处理用户输入,然后对其运行此函数:

function html_encode($var) {

        // Encodes HTML safely for UTF-8. Use instead of htmlentities.
        $var = htmlentities($var, ENT_QUOTES, 'UTF-8');

        // convert pesky special characters to unicode
        $look = array('™', '™','®','®');
        $safe = array('™', '™', '®', '®'); 

        $var = str_replace($look, $safe, $var);

        $var = mysql_real_escape_string($var); 

        return $var; 
                            }

将其输入数据库。

我通过使用此函数过滤所有内容来从数据库返回它:

function decodeit($var) {

        return html_entity_decode(stripcslashes($var), ENT_QUOTES, 'UTF-8');
                            }

不幸的是,毕竟我仍然得到不一致的结果。 大多数情况下,® 符号会变成小菱形。

我已经到处寻找对此的好方法,但似乎找不到最好的方法......

Okay, there is a ton of stuff out there on sanitizing strings but very little, that I can find, on the best methods to prepare user input (like what I'm typing now) for inserting into a content management system then how to filter it coming out.

I'm building two multilingual (Japanese, English + other Romance languages) CMSs and having a heck of a time with getting both special characters like ®, ™, to display along with Japanese characters.

I continue to get very inconsistent results.

I have everything set to UTF-8:

web page: and

.htaccess file: AddDefaultCharset UTF-8 AND (to force the issue)

after each db connection: mysql_query("SET NAMES 'UTF8'");

each database, table, and field is also set to utf8_general_ci

Magic quotes are off. I preprocess user input first with the default settings of htmlpurifier, then I run this function on it:

function html_encode($var) {

        // Encodes HTML safely for UTF-8. Use instead of htmlentities.
        $var = htmlentities($var, ENT_QUOTES, 'UTF-8');

        // convert pesky special characters to unicode
        $look = array('™', '™','®','®');
        $safe = array('™', '™', '®', '®'); 

        $var = str_replace($look, $safe, $var);

        $var = mysql_real_escape_string($var); 

        return $var; 
                            }

That get's it in to the database.

I return it from the database by filtering everything with this function:

function decodeit($var) {

        return html_entity_decode(stripcslashes($var), ENT_QUOTES, 'UTF-8');
                            }

Unfortunately, after all this I STILL get inconsistent results. Most often the ® symbols become little diamonds.

I've searched all over for a good tut on this but can't seem to find what are the best methods...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

把时间冻结 2024-07-19 11:23:52

抱歉,网页标题已被所见即所得编辑器删除。 为了清楚起见:

网页标题是:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

并且

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Sorry the web page headers got scrubbed by the wysiwyg editor. For clarity's sake:

Web page headers are:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

And

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
笔落惊风雨 2024-07-19 11:23:52

不要将 htmlentities 放入数据库中! 永远不要调用 html_entities(),它应该从 php.ini 中弃用。 使用 htmlspecialchars 但在显示文本时,而不是在将其放入数据库之前。 重点是防止您的数据被视为 html。 翻译商标符号或版权符号是没有意义的,因为它们不会造成风险。 您唯一需要担心的 html 是:> < & ’”

Don't put htmlentities in your database! Never call html_entities(), it should be deprecated from php. Use htmlspecialchars but when you display the text, not before you put it in the database. The point is to prevent your data from being treated as html. There is no point in translating trademark symbols or copyright symbols, because they don't cause a risk. The only html you need to worry about is: > < & ' "

肩上的翅膀 2024-07-19 11:23:52

一切都已经编码为utf8。 将其解码为 ISO-8859-1 只会毁掉任何日本人。

Everything is already encoded utf8. Decoding it to ISO-8859-1 would merely wreck any Japanese.

左耳近心 2024-07-19 11:23:52

我曾经遇到过编码问题,归结为 php 文件本身的编码。 所以基本上要确保文件本身编码为 utf-8。 在 vim 中你可以做
:e ++enc=

I once had an issue with encoding that came down to the encoding of the php files themselves. So basically make sure the files themselves are encoded to utf-8. In vim you can do
:e ++enc=

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文