如何处理 PHP 中的字符编码 - Codeigniter?
将用户输入转换为 UTF-8 的最佳方法是什么?
我有一个简单的表单,用户将在其中传递 HTML,HTML 可以采用任何语言,也可以采用任何字符编码格式。
我的问题是:
是否可以将所有内容表示为 UTF-8?
我可以使用什么来有效地将任何字符编码转换为 UTF-8,以便我可以使用 PHP 字符串函数解析它并将其保存到我的数据库中,然后使用
htmlentities
回显?
我正在尝试找出如何最好地实现这一点 - 建议和链接表示赞赏。
我正在利用 Codeigniter 及其 输入类 来检索发布数据。
我应该指出几点:
- 我需要将 HTML 特殊字符转换为其各自的实体
- 接受编码并以相同的编码返回它可能是一个好主意。但是,我的网络应用程序正在使用:
这可能会产生不利影响对事物的影响。
What is the best way to convert user input to UTF-8?
I have a simple form where a user will pass in HTML, the HTML can be in any language and it can be in any character encoding format.
My question is:
Is it possible to represent everything as UTF-8?
What can I use to effectively convert any character encoding to UTF-8 so that I can parse it with PHP string functions and save it to my database and subsequently echo out using
htmlentities
?
I am trying to work out how to best implement this - advice and links appreciated.
I am making use of Codeigniter and its input class to retrieve post data.
A few points I should make:
- I need to convert HTML special characters to their respective entities
- It might be a good idea to accept encoding and return it in that same encoding. However, my web app is making use of :
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
This might have an adverse effect on things.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
在
请参阅此处获取有关 如何使用 UTF-8贯穿您的网络堆栈。
Specify
accept-charset
in your<form>
tag to tell the browser to submit user-entered data encoded in UTF-8:See here for a complete guide on HOW TO Use UTF-8 Throughout Your Web Stack.
是的,UTF-8 是一种 Unicode 编码,因此您可以使用 Unicode 中定义的任何字符。这是迄今为止你能用计算机做的最好的事情。
iconv
让您将几乎任何编码转换为任何其他编码。 但是,为此你必须知道你正在处理什么编码。你不能说“iconv
,无论这是什么,都将其设置为 UTF-8!”。不幸的是,事情并非如此。您只能说“iconv
,我这里有 BIG5 格式的字符串,请将其转换为 UTF-8。”。如果您只处理 UTF-8 格式的表单数据,您可能永远不需要转换任何内容。
“PHP 字符串函数”对字节起作用。他们不关心字符或编码。根据您想要执行的操作,在 UTF-8 文本上使用简单的 PHP 字符串函数将会给您带来不好的结果。对任何多字节编码使用 MB 扩展中的编码感知字符串函数字符串操作。
只需确保您的数据库以 UTF-8 格式存储文本,并且您已将数据库连接设置为 UTF-8(即数据库知道您正在向其发送 UTF-8 数据)。您应该能够在 CodeIgniter 数据库连接设置中指定这一点。
只需
echo htmlentities($text)
,您无需执行任何其他操作。一点也不。它只是向浏览器发出信号,表明您的页面采用 UTF-8 编码。现在您只需要确保情况确实如此(正如您无论如何都试图做的那样)。它还暗示浏览器应该将UTF-8发送到服务器。您可以使用表单上的
accept-charset
属性来明确这一点。我可以推荐每个程序员绝对需要了解的关于处理文本的编码和字符集,其中可能会帮助你了解更多。
Yes, UTF-8 is a Unicode encoding, so you can use any character defined in Unicode. That's the best you can do with a computer to date.
iconv
lets you convert virtually any encoding to any other encoding. But, for that you have to know what encoding you're dealing with. You can't say "iconv
, whatever this is, make it UTF-8!". That's unfortunately not how it works. You can only say "iconv
, I have this string here in BIG5, please convert that to UTF-8.".If you're only dealing with form data in UTF-8 though, you'll probably never need to convert anything.
"PHP string functions" work on bytes. They don't care about characters or encodings. Depending on what you want to do, working with naive PHP string functions on UTF-8 text will give you bad results. Use encoding-aware string functions in the MB extension for any multi-byte encoding string manipulation.
Just make sure your database stores text in UTF-8 and you have set your database connection to UTF-8 (i.e. the database knows you're sending it UTF-8 data). You should be able to specify that in the CodeIgniter database connection settings.
Just
echo htmlentities($text)
, nothing more you need to do.Not at all. It just signals to the browser that your page is encoded in UTF-8. Now you just need to make sure that's actually the case (as you're trying to do anyway). It also implies to the browser that it should send UTF-8 to the server. You can make that explicit with the
accept-charset
attribute on forms.May I recommend What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text, which might help you understand more.
是的,一切都是在 UNICODE 中定义的。这是目前您可以获得的最多的信息,UNICODE 未来还有支持的空间。
您唯一需要知道的是数据的实际编码。如果您希望 Web 应用程序支持 UTF-8 输入和输出,则前端需要表明它支持 UTF-8。有关应用程序用户界面的指南,请参阅字符编码。
在 PHP 中,您需要为任何函数提供它支持的编码。有些需要指定编码,有些则需要转换它。请务必检查函数文档是否支持您的要求。另外检查您的 PHP 配置。
相关:
Yes, everything defined in UNICODE. That's the most you can get nowadays, and there is room for the future that UNICODE can support.
The only thing you need to know is the actual encoding of your data. If you want your webapplication to support UTF-8 for input and output, the frontend needs to signal that it supports UTF-8. See Character Encodings for a guide regarding your applications user-interface.
Within PHP you need to feed any function with the encoding it supports. Some need to have the encoding specified, for some you need to convert it. Always check the function docs if it supports what you ask for. Additionally check your PHP configuration.
Related:
如果你想改变字符串的编码,你可以尝试
If you want to change the encoding of a string you can try
我发现唯一适用于 UTF-8 编码的是在我的
config.php
中进行设置I found out that the only thing that works out for UTF-8 encoding is setting inside my
config.php
编辑:
是否可以将所有内容表示为 UTF-8?
是的,这些是您需要确保的:
我可以使用什么来有效地将任何字符编码转换为 UTF-8
您可以使用
utf8_encode
(因为主要针对西欧语言建立的系统,通常是 ISO-8859-1 或其密切相关,ref< /a>),然后将其保存到数据库中。正如我之前提到的,您需要确保数据库排序规则、表和数据编码为 utf-8。在 CI 中,在您的数据库连接配置中
EDIT :
Is it possible to represent everything as UTF-8?
Yes, these is what you need to ensure :
What can I use to effectively convert any character encoding to UTF-8
You can use
utf8_encode
(Since for a system set up mainly for Western European languages, it will generally be ISO-8859-1 or its close relation,ref) before saving it into your database.And as i mention before, you need to make sure database collation, tables and data encoding to utf-8. In CI, at your database connection config