用Python解码google的结果
我尝试编写程序从谷歌获取网址
,但问题是我得到了编码的网址!像这样 !
`[u'http://www.motorrad-live.de/test.php%3Fid%3D11', u'http://www.autogaleria.pl/
auto_test/test.php%3Fid%3D37', u'http://oculus.ru/test.php%3Fid%3D2', u'http://o
culus.ru/test.php%3Fid%3D1', u'http://www.kerrytaylorauctions.com/detail-test.ph
p%3Fid%3D3432', u'http://radio.ghanaweb.com/live-radio.test.php?id=3D4', u'http:
//www.studygerman.ru/test/test.php%3Fid%3D261', u'http://www.mhealth.ru/tests/te
st.php%3Fid%3D300']
正如您在 .php
之后看到的那样,有一些编码!
这是我的代码,尽管我的代码内容部分需要解码!
import json
import urllib
def print_results(results):
mylist=[]
n=[]
for r in results:
mylist.append(r['url'])
for each in mylist:
n.append(each.replace(u"%3FID%","?id="))
print n
def query(qs):
f = urllib.urlopen('http://ajax.googleapis.com/ajax/services/search/web?v=1.0&gl=de&q=%s&rsz=8&start=7'%qs)
s = f.read()
j = json.loads(s)
return j['responseData']['results']
a=query('inurl:"test.php?id"')
print_results(a)
I tried to make programme to get urls from google
but the problem is i got encoded urls ! like this !
`[u'http://www.motorrad-live.de/test.php%3Fid%3D11', u'http://www.autogaleria.pl/
auto_test/test.php%3Fid%3D37', u'http://oculus.ru/test.php%3Fid%3D2', u'http://o
culus.ru/test.php%3Fid%3D1', u'http://www.kerrytaylorauctions.com/detail-test.ph
p%3Fid%3D3432', u'http://radio.ghanaweb.com/live-radio.test.php?id=3D4', u'http:
//www.studygerman.ru/test/test.php%3Fid%3D261', u'http://www.mhealth.ru/tests/te
st.php%3Fid%3D300']
as you see after .php
there something encoded !
here is my code even thoug my code content part to decode !!
import json
import urllib
def print_results(results):
mylist=[]
n=[]
for r in results:
mylist.append(r['url'])
for each in mylist:
n.append(each.replace(u"%3FID%","?id="))
print n
def query(qs):
f = urllib.urlopen('http://ajax.googleapis.com/ajax/services/search/web?v=1.0&gl=de&q=%s&rsz=8&start=7'%qs)
s = f.read()
j = json.loads(s)
return j['responseData']['results']
a=query('inurl:"test.php?id"')
print_results(a)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您正在搜索 unquote 函数:
you're searching for the function unquote:
首先,您需要在插入查询字符串之前对其进行引用:
其次,我查看了返回的 json,发现未编码的 url 存储在键
unescapedUrl
下,因此您可以替换print_results(results)< /code> with:
如果您确实需要从
url
键读取它,请使用:first you need to quote the query string before interpolating it:
second i looked at the returned json and saw that the unencoded url is stored under the key
unescapedUrl
so you can replaceprint_results(results)
with:if you really need to read it from the
url
key, use: