文章来源于网络收集而来,版权归原创者所有,如有侵权请及时联系!
11.3 项目实战:爬取 toscrape 中的名人名言
11.3.1 项目需求
爬取网站http://quotes.toscrape.com/js中的名人名言信息。
11.3.2 页面分析
该网站的页面已在本章开头部分分析过,大家可以回头看相关内容。
11.3.3 编码实现
首先,在splash_examples项目目录下使用scrapy genspider命令创建Spider:
scrapy genspider quotes quotes.toscrape.com
在这个案例中,我们只需使用Splash的render.html端点渲染页面,再进行爬取即可实现QuotesSpider,代码如下:
# -*- coding: utf-8 -*- import scrapy from scrapy_splash import SplashRequest class QuotesSpider(scrapy.Spider): name = "quotes" allowed_domains = ["quotes.toscrape.com"] start_urls = ['http://quotes.toscrape.com/js/'] def start_requests(self): for url in self.start_urls: yield SplashRequest(url, args={'images': 0, 'timeout': 3}) def parse(self, response): for sel in response.css('div.quote'): quote = sel.css('span.text::text').extract_first() author = sel.css('small.author::text').extract_first() yield {'quote': quote, 'author': author} href = response.css('li.next > a::attr(href)').extract_first() if href: url = response.urljoin(href) yield SplashRequest(url, args={'images': 0, 'timeout': 3})
上述代码中,使用SplashRequest提交请求,在SplashRequest的构造器中无须传递endpoint参数,因为该参数默认值便是'render.html'。使用args参数禁止Splash加载图片,并设置渲染超时时间。
运行爬虫,观察结果:
$ scrapy crawl quotes -o quotes.csv ... $ cat -n quotes.csv 1 quote,author 2 “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”,Albert Einstein 3 "“It is our choices, Harry, that show what we truly are, far more than our abilities.”",J.K. Rowling 4 “There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”,Albert Einstein 5 "“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”",Jane Austen 6 "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”",Marilyn Monroe 7 “Try not to become a man of success. Rather become a man of value.”,Albert Einstein 8 “It is better to be hated for what you are than to be loved for what you are not.”,André Gide 9 "“I have not failed. I've just found 10,000 ways that won't work.”",Thomas A. Edison 10 “A woman is like a tea bag; you never know how strong it is until it's in hot water.”,Eleanor Roosevelt ... 91 "“I believe in Christianity as I believe that the sun has risen: not only because I see it, but because by it I see everything else.”",C.S. Lewis 92 "“The truth."" Dumbledore sighed. ""It is a beautiful and terrible thing, and should therefore be treated with great caution.”",J.K. Rowling 93 "“I'm the one that's got to die when it's time for me to die, so let me live my life the way I want to.”",Jimi Hendrix 94 “To die will be an awfully big adventure.”,J.M. Barrie 95 “It takes courage to grow up and become who you really are.”,E.E. Cummings 96 “But better to get hurt by the truth than comforted with a lie.”,Khaled Hosseini 97 “You never really understand a person until you consider things from his point of view... Until you climb inside of his skin and walk around in it.”,Harper Lee 98 "“You have to write the book that wants to be written. And if the book will be too difficult for grown-ups, then you write it for children.”",Madeleine L'Engle 99 “Never tell the truth to people who are not worthy of it.”,Mark Twain 100 "“A person's a person, no matter how small.”",Dr. Seuss 101 "“... a mind needs books as a sword needs a whetstone, if it is to keep its edge.”",George R.R. Martin
运行结果显示,我们成功爬取了10个页面中的100条名人名言。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论