用Python解码google的结果

发布于 2024-12-22 01:42:56 字数 1097 浏览 3 评论 0原文

我尝试编写程序从谷歌获取网址

,但问题是我得到了编码的网址!像这样 !

`[u'http://www.motorrad-live.de/test.php%3Fid%3D11', u'http://www.autogaleria.pl/
auto_test/test.php%3Fid%3D37', u'http://oculus.ru/test.php%3Fid%3D2', u'http://o
culus.ru/test.php%3Fid%3D1', u'http://www.kerrytaylorauctions.com/detail-test.ph
p%3Fid%3D3432', u'http://radio.ghanaweb.com/live-radio.test.php?id=3D4', u'http:
//www.studygerman.ru/test/test.php%3Fid%3D261', u'http://www.mhealth.ru/tests/te
st.php%3Fid%3D300']

正如您在 .php 之后看到的那样,有一些编码!

这是我的代码,尽管我的代码内容部分需要解码!

import json
import urllib


def print_results(results):
    mylist=[]
    n=[]
    for r in results:
        mylist.append(r['url'])
    for each in mylist:
         n.append(each.replace(u"%3FID%","?id="))
    print n


def query(qs):
    f = urllib.urlopen('http://ajax.googleapis.com/ajax/services/search/web?v=1.0&gl=de&q=%s&rsz=8&start=7'%qs)
    s = f.read()
    j = json.loads(s)

    return j['responseData']['results']
a=query('inurl:"test.php?id"')
print_results(a)

I tried to make programme to get urls from google

but the problem is i got encoded urls ! like this !

`[u'http://www.motorrad-live.de/test.php%3Fid%3D11', u'http://www.autogaleria.pl/
auto_test/test.php%3Fid%3D37', u'http://oculus.ru/test.php%3Fid%3D2', u'http://o
culus.ru/test.php%3Fid%3D1', u'http://www.kerrytaylorauctions.com/detail-test.ph
p%3Fid%3D3432', u'http://radio.ghanaweb.com/live-radio.test.php?id=3D4', u'http:
//www.studygerman.ru/test/test.php%3Fid%3D261', u'http://www.mhealth.ru/tests/te
st.php%3Fid%3D300']

as you see after .php there something encoded !

here is my code even thoug my code content part to decode !!

import json
import urllib


def print_results(results):
    mylist=[]
    n=[]
    for r in results:
        mylist.append(r['url'])
    for each in mylist:
         n.append(each.replace(u"%3FID%","?id="))
    print n


def query(qs):
    f = urllib.urlopen('http://ajax.googleapis.com/ajax/services/search/web?v=1.0&gl=de&q=%s&rsz=8&start=7'%qs)
    s = f.read()
    j = json.loads(s)

    return j['responseData']['results']
a=query('inurl:"test.php?id"')
print_results(a)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

哽咽笑 2024-12-29 01:42:56

您正在搜索 unquote 函数:

urllib.unquote(url)

you're searching for the function unquote:

urllib.unquote(url)
浮萍、无处依 2024-12-29 01:42:56

首先,您需要在插入查询字符串之前对其进行引用:

>>> urllib.quote("inurl:\"test.php?id\"")
'inurl%3A%22test.php%3Fid%22'

>>> "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&gl=de&q=%(q)s&rsz=8&start=0" % dict(q=urllib.quote("inurl:\"test.php?id\""))
'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&gl=de&q=inurl%3A%22test.php%3Fid%22&rsz=8&start=0'

其次,我查看了返回的 json,发现未编码的 url 存储在键 unescapedUrl 下,因此您可以替换 print_results(results)< /code> with:

def print_results(results):
    L=list(r['unescapedUrl'] for r in results)
    print L

如果您确实需要从 url 键读取它,请使用:

def print_results(results):
    L=list(urllib.unquote(r['url']) for r in results)
    print L

first you need to quote the query string before interpolating it:

>>> urllib.quote("inurl:\"test.php?id\"")
'inurl%3A%22test.php%3Fid%22'

>>> "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&gl=de&q=%(q)s&rsz=8&start=0" % dict(q=urllib.quote("inurl:\"test.php?id\""))
'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&gl=de&q=inurl%3A%22test.php%3Fid%22&rsz=8&start=0'

second i looked at the returned json and saw that the unencoded url is stored under the key unescapedUrl so you can replace print_results(results) with:

def print_results(results):
    L=list(r['unescapedUrl'] for r in results)
    print L

if you really need to read it from the url key, use:

def print_results(results):
    L=list(urllib.unquote(r['url']) for r in results)
    print L
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文