php urldecode utf8编码问题

发布于 2024-10-21 18:04:47 字数 690 浏览 1 评论 0原文

当我尝试使用 urlencoded 值(一些西里尔词)获取 url 时:

http://example.com/?action=search&q=%E0%E2%F2%EE%EC%EE%E1%E8%EB%FC< /a>

解码后:

echo urldecode($_GET['q']); // it prints: ���������

所以,我需要通过以下方式转换为 utf-8(因为我的整个应用程序都使用 utf-8):

mb_convert_encoding($_GET['q'], "UTF-8", "windows-1251");

它有帮助,但是问题

谁/什么说它应该是完全正确“windows-1251”?它从哪里来? 如果我要使用其他语言,我如何定义适当的编码? 魔力在哪里?

(更新):页面编码为utf-8 (更新):实际上, urldecode($_GET['q']) 甚至不需要,看起来像 apache+php 模块做了一切,但是,仍然无法理解配置在哪里

when I'm trying to _GET url with urlencoded value (some cyrilic word):

http://example.com/?action=search&q=%E0%E2%F2%EE%EC%EE%E1%E8%EB%FC

after decoding:

echo urldecode($_GET['q']); // it prints: ���������

so, I need do conversion to utf-8 (because whole my application works with utf-8) via:

mb_convert_encoding($_GET['q'], "UTF-8", "windows-1251");

and it helps, but question:

Who/what says it should be EXACTLY "windows-1251" ? where from it comes?
if i'll use some other languages, how I can define appropriate encoding?
where is the magic?

(update): page encoding is utf-8
(update): actually, urldecode($_GET['q']) even not needed, looks like apache+php module doing everything, but, still can't understand where configs are

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

听风吹 2024-10-28 18:04:47

答案是,您无法确定这一点,因为它可能会因请求而异,特别是如果它并不总是从表单提交,但有时使用 ajax 发送,或由用户直接在地址栏中键入。

我使用波兰语的应用程序。该应用程序使用 ISO-8859-2 代码页,所有 html 输出均以此编码提供。

应用程序以两种不同的编码接收请求,具体取决于请求的上下文:

  1. 如果请求是作为表单提交的结果发出的,则编码与包含已提交表单的 html 页面相同。我认为可以用表单元素的accept-charset属性来改变它,但我还没有尝试过。
  2. 如果请求是通过 Ajax 发出的,那么它始终是 UTF-8(至少在 Chrome 和 Firefox 中,因为我们的客户端仅使用这些浏览器)。
  3. 如果请求是手动输入到 URL 中的,那么它通常是 UTF-8,但如果它是书签或类似的东西,那么它可能是其他编码(取决于书签的创建方式)。

所以,真的没有办法确定。如果可以的话,请始终使用 UTF-8。否则使用字符集检测(检查它是否是 UTF-8,如果不是,则根据您的应用程序使用的语言回退到最可能的编码)。

我使用以下代码:

<?php
$t = 'zażółć gęślą jaźń';
echo mb_detect_encoding($t, 'UTF-8,ISO-8859-2');

最好的问候,
斯威尔克

The answer is that you can't know that for sure, as it might change from request to request, especially if it is not always submitted from form, but sometimes send with ajax, or typed directly in address bar by user.

I work with an appliction which is Polish language. The application works with ISO-8859-2 codepage, and all the html output is served in this encoding.

The application receives request in two different encodings, depending on the context of request:

  1. If the request is made as a result of form submit, then the encoding is the same as the html page with the submitted form. I think it could be altered with accept-charset attribute of form element, but I have not tried it.
  2. If the request is made with Ajax then it is always UTF-8 (at least in Chrome and Firefox, as our client uses only those browsers).
  3. If the request is manually entered into the URL, then it is usually UTF-8, but if it was a bookmark or something like that, then it might be other encoding (depends on how the bookmark was created).

So, really no way to know for sure. If you can, always use UTF-8. Otherwise use charset detection (check if it is UTF-8, if not fall back to the most probable encoding based on the language your application is using).

I use the following code:

<?php
$t = 'zażółć gęślą jaźń';
echo mb_detect_encoding($t, 'UTF-8,ISO-8859-2');

Best regards,
SWilk

美人骨 2024-10-28 18:04:47

这不是 apache 也不是 mod_php 问题。 PHP 会自动解码 urlencoding 但它不会编码任何内容,因此,

从这里看来

,没有什么可担心的

在 Firefox3 中输入 example.com/?action=search&q=автомобиль 时,它会自动转换为:example.com/?action=search&q=%E0%E2%F2%EE%EC%EE%E1% E8%EB%FC

这更像是浏览器或操作系统问题。

看来您的操作系统编码是单字节,并且浏览器会对您的单字节字符串进行urlencode。

it is not apache nor mod_php issue. PHP does decode urlencoding automatically but it doesn't encode anything, so, there is nothing to worry about

as it seems from this

when typing in Firefox3 example.com/?action=search&q=автомобиль it converts automatically to: example.com/?action=search&q=%E0%E2%F2%EE%EC%EE%E1%E8%EB%FC

it's more like browser or operation system issue.

it seems that your OS encoding is single-byte and browser does urlencode your single-byte string.

不一样的天空 2024-10-28 18:04:47

您应该保留 UTF8 并使用适当的内容类型标头将页面的字符集设置为 UTF8:

header('Content-type: text/html; charset=utf-8');

You should keep UTF8 and set your page's charset to UTF8 using the appropriate content-type header:

header('Content-type: text/html; charset=utf-8');
白云悠悠 2024-10-28 18:04:47

当您直接在 URL 搜索栏中键入非 ASCII 字符时,浏览器似乎会自动将字符转换为 UTF-8 和 URL 编码的实体。我没有这方面的硬数据,但这种行为是有道理的。相关问题如下:URL 中的 Unicode 字符

您的页面正在使用 windows-1252 或其他一些单字节字符集作为其输出编码,这就是为什么您需要首先转换字符数据。

您可以将页面的输出编码更改为 UTF-8 以节省这一步,但这可能会产生其他后果(例如需要使用多字节字符串函数和/或数据库输出的不同编码等)

When you type non-ASCII characters directly into the URL search bar, the browser seems to automatically convert the characters into UTF-8 and URL encoded entities. I have no hard data on this but the behaviour makes sense. Related question here: Unicode characters in URLs

Your page is using windows-1252 or some other single-byte character set as its output encoding, which is why you need to convert the character data first.

You could change your page's output encoding to UTF-8 to save yourself that step, but that may have other consequences (like the need to use multi-byte string functions and/or a different encoding for database output, etc.)

你的背包 2024-10-28 18:04:47

windows-1251 是一种 8 位字符编码,旨在涵盖使用西里尔字母的语言。
Wiki

您可能已在网页中将字符集设置为 windows-1251

windows-1251 is an 8-bit character encoding designed to cover languages that use Cyrillic alphabets.
Wiki

You might have set the charset to windows-1251 in your webpage

梦魇绽荼蘼 2024-10-28 18:04:47

我也遇到了这个问题。我使用adobe dreameweaver cs4(非英文版)

我解决如下:

  1. add header('Content-type: text/html; charset=utf-8');< /code> 位于 PHP 页面文件的顶部。

  2. 重要adobe dreameweaver中,您应该从顶部菜单修改页面属性 修改 ( M)->页面属性(P),选择标题/编码并手动修改unicodeunicode (uft-8)

(抱歉,这些菜单词是翻译成英文的,可能不是真正的词)

I also met this problem. I use adobe dreameweaver cs4 (non english version)

I solve it as below:

  1. add header('Content-type: text/html; charset=utf-8'); at the top of the PHP page file.

  2. IMPORTANT In adobe dreameweaver, you should modify Page Properties from the top menu Modify (M) -> Page Properties (P), choose Title/coding and modify unicode to unicode (uft-8) handly.

(sorry, these menu words are translated to english, maybe not the real words)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文