如何解码字符串以与 Google 语言检测 API 一起使用?

发布于 2024-09-27 02:34:32 字数 1269 浏览 0 评论 0原文

我想在我的应用中使用 Google 语言检测 API 来检测网址的语言范围。例如用户请求的url

http://myapp.com/q?Это тест

并获取消息“Russian”。我这样做:

def get(self):                                            
        url = "http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q="+self.request.query                        
        try:
            data = json.loads(urllib2.urlopen(url).read())                
            self.response.out.write('<html><body>' + data["responseData"]["language"] +'</body></html>')                                  
        except urllib2.HTTPError, e:
            self.response.out.write( "HTTP error: %d" % e.code )
        except urllib2.URLError, e:
            self.response.out.write( "Network error: %s" % e.reason.args[1])

但总是得到“英语”结果,因为 url 是编码的

http://myapp.com/q ?%DD%F2%EE%20%F2%E5%F1%F2

我尝试过 urllib.quote , urllib.urlencode 但没有成功。

如何为 Google Api 解码此网址?

I want to use Google Language Detection API in my app to detect language of url parameter. For example user requests url

http://myapp.com/q?Это тест

and gets message "Russian". I do it this way:

def get(self):                                            
        url = "http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q="+self.request.query                        
        try:
            data = json.loads(urllib2.urlopen(url).read())                
            self.response.out.write('<html><body>' + data["responseData"]["language"] +'</body></html>')                                  
        except urllib2.HTTPError, e:
            self.response.out.write( "HTTP error: %d" % e.code )
        except urllib2.URLError, e:
            self.response.out.write( "Network error: %s" % e.reason.args[1])

but always get "English" as result because url is encoded in

http://myapp.com/q?%DD%F2%EE%20%F2%E5%F1%F2

I've tried urllib.quote , urllib.urlencode with no luck.

How I have to decode this url for Google Api?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

不喜欢何必死缠烂打 2024-10-04 02:34:32

也许 urllib.unquote 就是您正在寻找的:

>>> from urllib import unquote
>>> unquote("%DD%F2%EE%20%F2%E5%F1%F2")

这会为您提供一个字符串,其中的字符采用您在 URL 中使用的任何编码。如果要将其重新编码为不同的编码(例如 UTF-8),则必须首先创建一个 unicode 对象,然后使用 encode 方法>unicode 对象对其进行重新编码:

>>> from urllib import unquote, quote
>>> import json, urllib2, pprint
>>> decoded = unicode(unquote("%DD%F2%EE%20%F2%E5%F1%F2"), "windows-1251")
>>> print decoded
Это тест
>>> recoded = decoded.encode("utf-8")

此时,我们有一个 UTF-8 编码的字符串,但这仍然不适合传递给 Google 语言检测 API:

>>> recoded
'\xd0\xad\xd1\x82\xd0\xbe \xd1\x82\xd0\xb5\xd1\x81\xd1\x82'

因为您想将此字符串包含在URL 作为查询参数,您必须使用 urllib.quote 对其进行编码:

>>> url = "http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=%s" % quote(recoded)
>>> data = json.loads(urllib2.urlopen(url).read())
>>> pprint.pprint(data)
{u'responseData': {u'confidence': 0.094033934,
                   u'isReliable': False,
                   u'language': u'ru'},
 u'responseDetails': None,
 u'responseStatus': 200}

Maybe urllib.unquote is what you are looking for:

>>> from urllib import unquote
>>> unquote("%DD%F2%EE%20%F2%E5%F1%F2")

This gives you a string in which the characters are in whatever encoding that you've used in the URL. If you want to recode it to a different encoding (say, UTF-8), you have to create a unicode object first and then use the encode method of the unicode object to recode it:

>>> from urllib import unquote, quote
>>> import json, urllib2, pprint
>>> decoded = unicode(unquote("%DD%F2%EE%20%F2%E5%F1%F2"), "windows-1251")
>>> print decoded
Это тест
>>> recoded = decoded.encode("utf-8")

At this point, we have an UTF-8 encoded string, but this is still not suitable to be passed on to the Google Language Detection API:

>>> recoded
'\xd0\xad\xd1\x82\xd0\xbe \xd1\x82\xd0\xb5\xd1\x81\xd1\x82'

Since you want to include this string in a URL as a query argument, you have to encode it using urllib.quote:

>>> url = "http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=%s" % quote(recoded)
>>> data = json.loads(urllib2.urlopen(url).read())
>>> pprint.pprint(data)
{u'responseData': {u'confidence': 0.094033934,
                   u'isReliable': False,
                   u'language': u'ru'},
 u'responseDetails': None,
 u'responseStatus': 200}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文