Scrapy“解析”函数没有被执行
我开始在 Ubuntu 11 上使用 scrapy,并遇到问题。具体来说,以下代码中的解析函数不会执行,尽管终端显示蜘蛛已执行并成功关闭
from scrapy.contrib.spiders import CrawlSpider
from scrapy.selector import HtmlXPathSelector
class myTestSpider(CrawlSpider):
name="go4mumbai.com"
domain_name = "go4mumbai.com"
start_urls = ["http://www.go4mumbai.com/Mumbai_Bus_Route.php?busno=1"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
stopNames=hxs.select('//table[@cellspacing="2"]/tr/td[2]/a/text()').extract()
print len(stopNames)
SPIDER = myTestSpider()
以下是终端的响应
rupin@rupin-laptop:~/Desktop/ScrappyTest/basetest$ sudo scrapy crawl go4mumbai.com
2011-09-21 15:33:56+0530 [scrapy] INFO: Scrapy 0.12.0.2528 started (bot: basetest)
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled extensions: TelnetConsole, SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled scheduler middlewares: DuplicatesFilterMiddleware
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, DownloaderStats
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled item pipelines:
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-09-21 15:33:56+0530 [go4mumbai.com] INFO: Spider opened
2011-09-21 15:33:58+0530 [go4mumbai.com] DEBUG: Crawled (200) <GET http://www.go4mumbai.com/Mumbai_Bus_Route.php?busno=1> (referer: None)
2011-09-21 15:33:58+0530 [go4mumbai.com] INFO: Closing spider (finished)
2011-09-21 15:33:58+0530 [go4mumbai.com] INFO: Spider closed (finished)
我是否缺少某些代码部分?请指教..
I have started to use scrapy on Ubuntu 11, and facing issue. Specifically the parse function in the following code does not execute, although the terminal shows the spider executed and closed successfully
from scrapy.contrib.spiders import CrawlSpider
from scrapy.selector import HtmlXPathSelector
class myTestSpider(CrawlSpider):
name="go4mumbai.com"
domain_name = "go4mumbai.com"
start_urls = ["http://www.go4mumbai.com/Mumbai_Bus_Route.php?busno=1"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
stopNames=hxs.select('//table[@cellspacing="2"]/tr/td[2]/a/text()').extract()
print len(stopNames)
SPIDER = myTestSpider()
The following is the response from the terminal
rupin@rupin-laptop:~/Desktop/ScrappyTest/basetest$ sudo scrapy crawl go4mumbai.com
2011-09-21 15:33:56+0530 [scrapy] INFO: Scrapy 0.12.0.2528 started (bot: basetest)
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled extensions: TelnetConsole, SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled scheduler middlewares: DuplicatesFilterMiddleware
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, DownloaderStats
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Enabled item pipelines:
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2011-09-21 15:33:56+0530 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-09-21 15:33:56+0530 [go4mumbai.com] INFO: Spider opened
2011-09-21 15:33:58+0530 [go4mumbai.com] DEBUG: Crawled (200) <GET http://www.go4mumbai.com/Mumbai_Bus_Route.php?busno=1> (referer: None)
2011-09-21 15:33:58+0530 [go4mumbai.com] INFO: Closing spider (finished)
2011-09-21 15:33:58+0530 [go4mumbai.com] INFO: Spider closed (finished)
Is there some part of the code I am missing? Please advise..
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的
parse()
函数似乎不属于您的蜘蛛类。将整个函数缩进一个缩进,因此它属于该类并被调用。
Your
parse()
function does not seem to belong to your spider class.Indent the whole function for one indention, so it belongs to the class and gets called.