pyspider运行脚本提示gaierror: [Errno -2] Name or service not known
docker部署的pyspider,两台机器
机器 A docker-compose scale mysql=1 redis=1 scheduler=1
,三个容器运行正常
docker-compose.yml文件如下:
mysql:
image: mysql:latest
container_name: cash-mysql
environment:
- LANG=C.UTF-8
- MYSQL_ROOT_PASSWORD=423401
volumes:
- /docker/conf/mysql/my.cnf:/etc/mysql/conf.d/my.cnf
- /docker/data/mysql:/var/lib/mysql
ports:
- "3306:3306"
redis:
image: redis:latest
container_name: cash-redis
restart: always
volumes:
- /docker/data/redis:/data
- /docker/conf/redis/redis.conf:/usr/local/etc/redis/redis.conf
ports:
- "6379:6379"
scheduler:
image: binux/pyspider
container_name: cash-scheduler
external_links:
- 'cash-mysql:mysql'
- 'cash-redis:redis'
command: '--taskdb "mysql+taskdb://root:1@192.168.159.166:3306/taskdb" --projectdb "mysql+projectdb://root:1@192.168.159.166:3306/projectdb" --resultdb "mysql+resultdb://root:1@192.168.159.166:3306/resultdb" --message-queue "redis://192.168.159.166:6379/1" scheduler --inqueue-limit 5000 --delete-time 43200'
ports:
- "23333:23333"
restart: always
192.168.159.166为机器A的内网IP
机器 B docker-compose scale phantomjs=2 processor=2 webui=1 phantomjs-lb=1 fetcher=1 fetcher-lb=1 result-worker=1 webui-lb=1 nginx=1
.容器运行正常
docker-compose.yml文件如下:
phantomjs:
image: 'binux/pyspider:latest'
command: phantomjs
cpu_shares: 512
environment:
- 'EXCLUDE_PORTS=5000,23333,24444'
expose:
- '25555'
mem_limit: 512m
restart: always
phantomjs-lb:
image: 'dockercloud/haproxy:latest'
links:
- phantomjs
restart: always
fetcher:
image: 'binux/pyspider:latest'
command: '--message-queue "redis://192.168.159.166:6379/1" --phantomjs-proxy "phantomjs:80" fetcher --xmlrpc'
cpu_shares: 512
environment:
- 'EXCLUDE_PORTS=5000,25555,23333'
links:
- 'phantomjs-lb:phantomjs'
mem_limit: 128m
restart: always
fetcher-lb:
image: 'dockercloud/haproxy:latest'
links:
- fetcher
restart: always
processor:
image: 'binux/pyspider:latest'
command: '--projectdb "mysql+projectdb://root:1@192.168.159.166/projectdb" --message-queue "redis://192.168.159.166:6379/1" processor'
cpu_shares: 512
mem_limit: 256m
restart: always
result-worker:
image: 'binux/pyspider:latest'
command: '--taskdb "mysql+taskdb://root:1@192.168.159.166/taskdb" --projectdb "mysql+projectdb://root:1@192.168.159.166/projectdb" --resultdb "mysql+resultdb://root:1@192.168.159.166/resultdb" --message-queue "redis://192.168.159.166:6379/1" result_worker'
cpu_shares: 512
mem_limit: 256m
restart: always
webui:
image: 'binux/pyspider:latest'
container_name: cash-webui
command: '--taskdb "mysql+taskdb://root:1@192.168.159.166/taskdb" --projectdb "mysql+projectdb://root:1@192.168.159.166/projectdb" --resultdb "mysql+resultdb://root:1@192.168.159.166/resultdb" --message-queue "redis://192.168.159.166:6379/1" webui --max-rate 0.5 --max-burst 3 --scheduler-rpc "http://192.168.159.166:23333/" --fetcher-rpc "http://fetcher/"'
cpu_shares: 512
environment:
- 'EXCLUDE_PORTS=24444,25555,23333'
links:
- 'fetcher-lb:fetcher'
mem_limit: 256m
restart: always
webui-lb:
image: 'dockercloud/haproxy:latest'
links:
- webui
restart: always
nginx:
image: 'nginx:latest'
container_name: cash-nginx
links:
- 'webui-lb:HAPROXY'
ports:
- '0.0.0.0:80:80'
volumes:
- /docker/conf/nginx/nginx.conf:/etc/nginx/nginx.conf
- /docker/conf/nginx/conf.d/:/etc/nginx/conf.d/
restart: always
机器B内网IP为192.168.159.174,可以ping 192.168.159.166成功
在机器B运行脚本:
from pyspider.libs.base_handler import *
class Handler(BaseHandler):
crawl_config = {
}
@every(minutes=24 * 60)
def on_start(self):
self.crawl('http://blog.binux.me/', callback=self.index_page)
@config(age=10 * 24 * 60 * 60)
def index_page(self, response):
for each in response.doc('a[href^="http"]').items():
self.crawl(each.attr.href, callback=self.detail_page)
@config(priority=2)
def detail_page(self, response):
return {
"url": response.url,
"title": response.doc('title').text(),
}
错误提示:
Traceback (most recent call last):
File "/opt/pyspider/pyspider/run.py", line 345, in <lambda>
app.config['fetch'] = lambda x: umsgpack.unpackb(fetcher_rpc.fetch(x).data)
File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
verbose=self.__verbose
File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request
return self.single_request(host, handler, request_body, verbose)
File "/usr/lib/python2.7/xmlrpclib.py", line 1301, in single_request
self.send_content(h, request_body)
File "/usr/lib/python2.7/xmlrpclib.py", line 1448, in send_content
connection.endheaders(request_body)
File "/usr/lib/python2.7/httplib.py", line 962, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 822, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 784, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 765, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno -2] Name or service not known
机器A和B系统均为ubuntu 15.10. interfaces文件中都已添加内网IP。
google了gaierror: [Errno -2] Name or service not known
未找到答案。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
docker exec -it fetcher /bin/bash
进入查看 DNS 是否正常