php/mysql CMS 的多语言用户输入前后处理的最佳方法
好吧,关于清理字符串有很多东西,但我能找到的关于准备用户输入(就像我现在输入的内容)以插入内容管理系统以及如何过滤的最佳方法的东西却很少它出来了。
我正在构建两个多语言(日语、英语+其他罗曼语系语言)CMS,并且在将 ®、™ 等特殊字符与日语字符一起显示时遇到了麻烦。
我继续得到非常不一致的结果。
我将所有内容设置为 UTF-8:
网页:和
.htaccess 文件:AddDefaultCharset UTF-8 AND(强制解决问题)
在每个数据库连接之后: mysql_query("SET NAMES 'UTF8'");
每个数据库、表和字段也设置为 utf8_general_ci
魔术引号关闭。 我首先使用 htmlpurifier 的默认设置预处理用户输入,然后对其运行此函数:
function html_encode($var) {
// Encodes HTML safely for UTF-8. Use instead of htmlentities.
$var = htmlentities($var, ENT_QUOTES, 'UTF-8');
// convert pesky special characters to unicode
$look = array('™', '™','®','®');
$safe = array('™', '™', '®', '®');
$var = str_replace($look, $safe, $var);
$var = mysql_real_escape_string($var);
return $var;
}
将其输入数据库。
我通过使用此函数过滤所有内容来从数据库返回它:
function decodeit($var) {
return html_entity_decode(stripcslashes($var), ENT_QUOTES, 'UTF-8');
}
不幸的是,毕竟我仍然得到不一致的结果。 大多数情况下,® 符号会变成小菱形。
我已经到处寻找对此的好方法,但似乎找不到最好的方法......
Okay, there is a ton of stuff out there on sanitizing strings but very little, that I can find, on the best methods to prepare user input (like what I'm typing now) for inserting into a content management system then how to filter it coming out.
I'm building two multilingual (Japanese, English + other Romance languages) CMSs and having a heck of a time with getting both special characters like ®, ™, to display along with Japanese characters.
I continue to get very inconsistent results.
I have everything set to UTF-8:
web page: and
.htaccess file: AddDefaultCharset UTF-8 AND (to force the issue)
after each db connection: mysql_query("SET NAMES 'UTF8'");
each database, table, and field is also set to utf8_general_ci
Magic quotes are off. I preprocess user input first with the default settings of htmlpurifier, then I run this function on it:
function html_encode($var) {
// Encodes HTML safely for UTF-8. Use instead of htmlentities.
$var = htmlentities($var, ENT_QUOTES, 'UTF-8');
// convert pesky special characters to unicode
$look = array('™', '™','®','®');
$safe = array('™', '™', '®', '®');
$var = str_replace($look, $safe, $var);
$var = mysql_real_escape_string($var);
return $var;
}
That get's it in to the database.
I return it from the database by filtering everything with this function:
function decodeit($var) {
return html_entity_decode(stripcslashes($var), ENT_QUOTES, 'UTF-8');
}
Unfortunately, after all this I STILL get inconsistent results. Most often the ® symbols become little diamonds.
I've searched all over for a good tut on this but can't seem to find what are the best methods...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
抱歉,网页标题已被所见即所得编辑器删除。 为了清楚起见:
网页标题是:
并且
Sorry the web page headers got scrubbed by the wysiwyg editor. For clarity's sake:
Web page headers are:
And
不要将 htmlentities 放入数据库中! 永远不要调用 html_entities(),它应该从 php.ini 中弃用。 使用 htmlspecialchars 但在显示文本时,而不是在将其放入数据库之前。 重点是防止您的数据被视为 html。 翻译商标符号或版权符号是没有意义的,因为它们不会造成风险。 您唯一需要担心的 html 是:> < & ’”
Don't put htmlentities in your database! Never call html_entities(), it should be deprecated from php. Use htmlspecialchars but when you display the text, not before you put it in the database. The point is to prevent your data from being treated as html. There is no point in translating trademark symbols or copyright symbols, because they don't cause a risk. The only html you need to worry about is: > < & ' "
https://www.php.net/utf8_encode
https://www.php.net/utf8-decode
这应该有帮助。
https://www.php.net/utf8_encode
https://www.php.net/utf8-decode
That should help.
一切都已经编码为utf8。 将其解码为 ISO-8859-1 只会毁掉任何日本人。
Everything is already encoded utf8. Decoding it to ISO-8859-1 would merely wreck any Japanese.
我曾经遇到过编码问题,归结为 php 文件本身的编码。 所以基本上要确保文件本身编码为 utf-8。 在 vim 中你可以做
:e ++enc=
I once had an issue with encoding that came down to the encoding of the php files themselves. So basically make sure the files themselves are encoded to utf-8. In vim you can do
:e ++enc=