蟒蛇 +机械化异步任务

发布于 2024-10-08 21:25:30 字数 737 浏览 0 评论 0原文

所以我有一段 python 代码，它运行在一个美味的页面上，并从中删除一些链接。 extract 方法包含一些神奇的功能，可以提取所需的内容。然而，一个接一个地运行页面获取相当慢——有没有一种方法可以在 python 中执行此异步操作，以便我可以启动多个获取请求并并行处理页面？

url= "http://www.delicious.com/search?p=varun"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
extract(soup)

count=1
#Follows regexp match onto consecutive pages
while soup.find ('a', attrs={'class': 'pn next'}):
    print "yay"
    print count
    endOfPage = "false"
    try :
        page3 = br.follow_link(text_regex="Next")
        html3 = page3.read()
        soup3 = BeautifulSoup(html3)
        extract(soup3)
    except:
        print "End of Pages"
        endOfPage = "true"
    if valval == "true":
        break
    count = count +1

原文

So I have this bit of python code that runs through a delicious page and scrapes some links off of it. The extract method contains some magic that pull out the required content. However, running the page fetches one after another is pretty slow - is there a way to do this async in python so i can launch several get requests and process pages in parallel?

url= "http://www.delicious.com/search?p=varun"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
extract(soup)

count=1
#Follows regexp match onto consecutive pages
while soup.find ('a', attrs={'class': 'pn next'}):
    print "yay"
    print count
    endOfPage = "false"
    try :
        page3 = br.follow_link(text_regex="Next")
        html3 = page3.read()
        soup3 = BeautifulSoup(html3)
        extract(soup3)
    except:
        print "End of Pages"
        endOfPage = "true"
    if valval == "true":
        break
    count = count +1

分享到QQ

分享到微博