在 Python 和 Javascript 中解码 unicode姜戈
在网站上,我通过 POST 将单词 pluş
发送到 Django 视图。 它以 plu%25C8%2599
形式发送。因此,我采用了该字符串并尝试找出一种将 %25C8%2599
重新转换为 ş
的方法。
我尝试像这样解码字符串:
from urllib import unquote_plus
s = "plu%25C8%2599"
print unquote_plus(unquote_plus(s).decode('utf-8'))
我得到的结果是 pluÈ
,它的长度实际上是 5,而不是 4。
如何在编码后获得原始字符串 pluş
?
编辑:
我设法这样做
def js_unquote(quoted):
quoted = quoted.encode('utf-8')
quoted = unquote_plus(unquote_plus(quoted)).decode('utf-8')
return quoted
它看起来很奇怪,但按照我需要的方式工作。
On a website I have the word pluș
sent via POST to a Django view.
It is sent as plu%25C8%2599
. So I took that string and tried to figure out a way how to make %25C8%2599
back into ș
.
I tried decoding the string like this:
from urllib import unquote_plus
s = "plu%25C8%2599"
print unquote_plus(unquote_plus(s).decode('utf-8'))
The result i get is pluÈ
which actually has a length of 5, not 4.
How can I get the original string pluș
after it's encoded ?
edit:
I managed to do it like this
def js_unquote(quoted):
quoted = quoted.encode('utf-8')
quoted = unquote_plus(unquote_plus(quoted)).decode('utf-8')
return quoted
It looks weird but works the way I needed it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
URL 解码两次,然后解码为 UTF-8。
URL-decode twice, then decode as UTF-8.
除非你知道编码是什么,否则你不能。 Unicode 本身并不是一种编码。您可以尝试 BeautifulSoup 或 UnicodeDammit,这可能会帮助您获得所需的结果。
!
我希望这会有所帮助
另请查看:
http://www.joelonsoftware.com/articles/Unicode.html
You can't unless you know what the encoding is. Unicode itself is not an encoding. You might try BeautifulSoup or UnicodeDammit, which might help you get the result you were hoping for.
http://www.crummy.com/software/BeautifulSoup/
I hope this helps!
Also take a look at:
http://www.joelonsoftware.com/articles/Unicode.html
我就是这样尝试的。我尝试通过 HTML 表单直接向 django URI 发送 json POST 请求,其中包含 unicode 字符,如
"şğüöçı+"
并且它有效。我在encode()
函数中使用了iso_8859-9
编码器。I was try like that. I was tried to sent a json POST request by HTML form to directly a django URI, which is included unicode characters like
"şğüöçı+"
and it works. I have usediso_8859-9
encoder inencode()
function.