win7 64bit用python2.7.13运行pyspider过程中出现问题
如题,在pyspider中运行出错,但是调试完全正常
如图,调试完全正常,
图2,运行虽然显示有时间变化,但是数据库完全没有数据,调试过程中数据库是正常存取的
图3,cmd的提示,
图4,task信息,
图5-6是on_start信息
图7是on_finished信息
全代码为
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
# Created on 2017-04-02 23:03:02
# Project: demo
from pyspider.libs.base_handler import *
import pymongo
class Handler(BaseHandler):
crawl_config = {
}
client = pymongo.MongoClient('localhost')
db = client['trip']
@every(minutes=24 * 60)
def on_start(self):
self.crawl('http://www.tripadvisor.cn/Attractions-g186338-Activities-c47-London_England.html#ATTRACTION_LIST', callback=self.index_page)
@config(age=10 * 24 * 60 * 60)
def index_page(self, response):
for each in response.doc('.property_title > a').items():
self.crawl(each.attr.href, callback=self.detail_page)
next = response.doc('div.deckTools div > a').attr.href # .pagination a
self.crawl(next, callback=self.index_page)
@config(priority=2)
def detail_page(self, response):
url = response.url
name = response.doc('h1').text()
rating = response.doc('.heading_ratings .more').text()
address = response.doc('.addressReset > span > span').text()
phone = response.doc('.phoneNumber').text()
duration = response.doc('div.above_fold_listing_details > div > div:nth-child(5) > div > div:nth-child(1)').text()
introduction = response.doc('div.above_fold_listing_details > div > div:nth-child(6) > div > p').text()
print(url, name, rating, address, phone, duration, introduction)
return {
"url": url,
"name": name,
"rating": rating,
"address": address,
"phone": phone,
"duration": duration,
"introduction": introduction
}
def on_result(self, result):
if result:
self.save_to_mongo(result)
def save_to_mongo(self, result):
if self.db['london'].insert(result):
print("save to mongo", result)
请各位帮忙看看,多谢
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
抓过的连接不会重复抓取,如果你抓取过后又修改了代码,想要重新抓取一遍,请使用 http://docs.pyspider.org/en/l...