php urldecode utf8编码问题
当我尝试使用 urlencoded 值(一些西里尔词)获取 url 时:
http://example.com/?action=search&q=%E0%E2%F2%EE%EC%EE%E1%E8%EB%FC< /a>
解码后:
echo urldecode($_GET['q']); // it prints: ���������
所以,我需要通过以下方式转换为 utf-8(因为我的整个应用程序都使用 utf-8):
mb_convert_encoding($_GET['q'], "UTF-8", "windows-1251");
它有帮助,但是问题:
谁/什么说它应该是完全正确“windows-1251”?它从哪里来? 如果我要使用其他语言,我如何定义适当的编码? 魔力在哪里?
(更新):页面编码为utf-8 (更新):实际上, urldecode($_GET['q']) 甚至不需要,看起来像 apache+php 模块做了一切,但是,仍然无法理解配置在哪里
when I'm trying to _GET url with urlencoded value (some cyrilic word):
http://example.com/?action=search&q=%E0%E2%F2%EE%EC%EE%E1%E8%EB%FC
after decoding:
echo urldecode($_GET['q']); // it prints: ���������
so, I need do conversion to utf-8 (because whole my application works with utf-8) via:
mb_convert_encoding($_GET['q'], "UTF-8", "windows-1251");
and it helps, but question:
Who/what says it should be EXACTLY "windows-1251" ? where from it comes?
if i'll use some other languages, how I can define appropriate encoding?
where is the magic?
(update): page encoding is utf-8
(update): actually, urldecode($_GET['q']) even not needed, looks like apache+php module doing everything, but, still can't understand where configs are
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
答案是,您无法确定这一点,因为它可能会因请求而异,特别是如果它并不总是从表单提交,但有时使用 ajax 发送,或由用户直接在地址栏中键入。
我使用波兰语的应用程序。该应用程序使用 ISO-8859-2 代码页,所有 html 输出均以此编码提供。
应用程序以两种不同的编码接收请求,具体取决于请求的上下文:
所以,真的没有办法确定。如果可以的话,请始终使用 UTF-8。否则使用字符集检测(检查它是否是 UTF-8,如果不是,则根据您的应用程序使用的语言回退到最可能的编码)。
我使用以下代码:
最好的问候,
斯威尔克
The answer is that you can't know that for sure, as it might change from request to request, especially if it is not always submitted from form, but sometimes send with ajax, or typed directly in address bar by user.
I work with an appliction which is Polish language. The application works with ISO-8859-2 codepage, and all the html output is served in this encoding.
The application receives request in two different encodings, depending on the context of request:
So, really no way to know for sure. If you can, always use UTF-8. Otherwise use charset detection (check if it is UTF-8, if not fall back to the most probable encoding based on the language your application is using).
I use the following code:
Best regards,
SWilk
这不是 apache 也不是 mod_php 问题。 PHP 会自动解码 urlencoding 但它不会编码任何内容,因此,
从这里看来
这更像是浏览器或操作系统问题。
看来您的操作系统编码是单字节,并且浏览器会对您的单字节字符串进行urlencode。
it is not apache nor mod_php issue. PHP does decode urlencoding automatically but it doesn't encode anything, so, there is nothing to worry about
as it seems from this
it's more like browser or operation system issue.
it seems that your OS encoding is single-byte and browser does urlencode your single-byte string.
您应该保留 UTF8 并使用适当的内容类型标头将页面的字符集设置为 UTF8:
You should keep UTF8 and set your page's charset to UTF8 using the appropriate content-type header:
当您直接在 URL 搜索栏中键入非 ASCII 字符时,浏览器似乎会自动将字符转换为 UTF-8 和 URL 编码的实体。我没有这方面的硬数据,但这种行为是有道理的。相关问题如下:URL 中的 Unicode 字符
您的页面正在使用
windows-1252 或其他一些单字节字符集作为其输出编码,这就是为什么您需要首先转换字符数据。
您可以将页面的输出编码更改为 UTF-8 以节省这一步,但这可能会产生其他后果(例如需要使用多字节字符串函数和/或数据库输出的不同编码等)
When you type non-ASCII characters directly into the URL search bar, the browser seems to automatically convert the characters into UTF-8 and URL encoded entities. I have no hard data on this but the behaviour makes sense. Related question here: Unicode characters in URLs
Your page is using
windows-1252
or some other single-byte character set as its output encoding, which is why you need to convert the character data first.You could change your page's output encoding to UTF-8 to save yourself that step, but that may have other consequences (like the need to use multi-byte string functions and/or a different encoding for database output, etc.)
windows-1251 是一种 8 位字符编码,旨在涵盖使用西里尔字母的语言。
Wiki
您可能已在网页中将字符集设置为 windows-1251
windows-1251 is an 8-bit character encoding designed to cover languages that use Cyrillic alphabets.
Wiki
You might have set the charset to windows-1251 in your webpage
我也遇到了这个问题。我使用
adobe dreameweaver cs4
(非英文版)我解决如下:
add
header('Content-type: text/html; charset=utf-8');< /code> 位于 PHP 页面文件的顶部。
重要在
adobe dreameweaver
中,您应该从顶部菜单
修改页面属性
修改 ( M)->页面属性(P)
,选择标题/编码
并手动修改unicode
为unicode (uft-8)
。(抱歉,这些菜单词是翻译成英文的,可能不是真正的词)
I also met this problem. I use
adobe dreameweaver cs4
(non english version)I solve it as below:
add
header('Content-type: text/html; charset=utf-8');
at the top of the PHP page file.IMPORTANT In
adobe dreameweaver
, you should modifyPage Properties
from thetop menu
Modify (M) -> Page Properties (P)
, chooseTitle/coding
and modifyunicode
tounicode (uft-8)
handly.(sorry, these menu words are translated to english, maybe not the real words)