(PHP) rawurlencode/decode 似乎编码 '£'签名为 '£' (%C2%A3 而不是 %A3)
所以,我遇到了 PHP 的 rawurlencode 函数的问题。当然,我们的网络应用程序中的所有文本字段在由网络服务器处理之前都会进行转换,并且我们为此使用了 rawurlencode。这对我找到的几乎所有字符都适用,除了“£”符号。现在,我们的用户没有理由输入井号,但他们可能会输入井号,所以我想解决这个问题。
问题是 rawurlencode 不会将网页上输入的井号编码为 %A3,而是编码为 %C2%A3。更糟糕的是,如果用户未能输入另一位关键信息(这会导致网页刷新 - 检查是在后端完成的 - 并尝试用用户使用过的信息重新填写表单框),那么当%C2经过rawurldecode/encode运行,就变成了à? - 又名,%C3?当然,“£”也变成了另一个£!
那么,是什么原因造成这种情况呢?我认为这是一个字符编码问题,但我对这些事情不太了解。我听说我可以手动将 £ 编码为 £,但是当数据库可以处理“£”并且井号有百分比编码时,为什么我需要这样做呢?这是rawurlencode的bug,还是字符集不同导致的bug?
感谢您的任何帮助。
So, I've run into a problem with PHP's rawurlencode function. All text fields in our web app are of course converted before being processed by the web-server, and we've used rawurlencode for this. This works fine with almost every character I've found, expect for the "£" sign. Now, there is no reason for our users to ever enter a pound sign, but they might, so I want to take care of this.
The problem is that rawurlencode doesn't encode a pound sign entered on the webpage as %A3, but instead as %C2%A3. Even worse, if the user failed to enter another bit of critical information (which causes the webpage to refresh - the checks are done on the backend side - and try and refill the form boxes with the information the user had used), then when the %C2 is run through rawurldecode/encode, it becomes Ã? - aka, %C3?. And of course the "£" is also turned into another £!
So, what is causing this? I assume it's a character encoding issue, but I'm not that knowledgable about these things. I heard somewhere that I can encode £s as £ manually, but why should I need to do that when the database can handle "£"s, and there is a percentage-encoding for a pound sign? Is this a bug in rawurlencode, or a bug caused by differing character sets?
Thanks for any help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
该标准要求以您在
显然,您收到的是以 UTF-8 编码的井号。如果要将其转换为 ISO-8859-15,请写入:
The standard requires forms to be submitted in the character encoding you specify in
<form accept-charset="...">
or UTF-8 if it's not specified or the text the user has entered cannot be represented in the charset you specify.Clearly, you're receiving the pound sign encoded in UTF-8. If you want to convert it to ISO-8859-15, write:
这可能是将本机字符集中的 A3 字符编码为 UTF-8 编码中的 C2A3 ,这似乎是 ANSI A3 的有效 UTF-8 编码。只需使用UTF-8编码
使用编码后的url,或者为urlencode指定ANSI编码即可。Artefacto的答案代表了您需要转换字符编码的情况,例如,您正在显示一个页面和该页面编码设置为 Latin-1。 (原始)Urlencode 将生成具有多字节字符表示形式的转义字符串。 (Raw)Urldecode 默认情况下会生成 utf-8 编码的字符串,并将 £ 表示为两个字节。如果您显示此字符串并声明它是 ISO-8859 编码字符串,则它将显示为两个字符。
PHP 和 UTF-8 入门: http://www.phpwact.org/php /i18n/utf-8
一些“热门提示”: http:// /www.sitepoint.com/blogs/2006/08/10/hot-php-utf-8-tips/
可能,在从 rawurldecode 获取字符串和使用该字符串之间,区域设置被假定为 ISO8859 ,因此当两个字节代表一个字符时,它们会被解释为两个字符。
使用 mb_convert_encoding 强制 PHP 认识到string 表示 UTF-8 编码的字符串。
This is probably encoding A3 character in your native character set to C2A3 in UTF-8 encoding, which seems to be the valid UTF-8 encoding for an ANSI A3. Just consume your encoded url using UTF-8 encoding
, or specify an ANSI encoding to urlencode.Artefacto's answer represents a case when you need to convert character encodings, for example, you are displaying a page and the page encoding is set to Latin-1. (Raw)Urlencode will produce escaped strings with multibyte character representations. (Raw)Urldecode will by default produce utf-8 encoded strings, and will represent £ as two bytes. If you display this string making a claim that it is a ISO-8859 encoded string, it will appear as two characters.
A primer on PHP and UTF-8: http://www.phpwact.org/php/i18n/utf-8
Some "hot tips": http://www.sitepoint.com/blogs/2006/08/10/hot-php-utf-8-tips/
Likely, between getting the string from rawurldecode, and using the string, the locale is assumed to be ISO8859, so two bytes get interpreted as two characters when they represent one.
Use mb_convert_encoding to force PHP to realize that the bytes in the string represent a UTF-8 encoded string.