pyspider运行脚本提示gaierror: [Errno -2] Name or service not known

发布于 2022-09-03 09:12:15 字数 5719 浏览 9 评论 0

docker部署的pyspider,两台机器
机器 A docker-compose scale mysql=1 redis=1 scheduler=1,三个容器运行正常
docker-compose.yml文件如下:

mysql:
  image: mysql:latest
  container_name: cash-mysql
  environment:
    - LANG=C.UTF-8
    - MYSQL_ROOT_PASSWORD=423401
  volumes:
    - /docker/conf/mysql/my.cnf:/etc/mysql/conf.d/my.cnf
    - /docker/data/mysql:/var/lib/mysql
  ports:
    - "3306:3306"
redis:
  image: redis:latest
  container_name: cash-redis
  restart: always
  volumes:
    - /docker/data/redis:/data
    - /docker/conf/redis/redis.conf:/usr/local/etc/redis/redis.conf
  ports:
    - "6379:6379"
scheduler:
  image: binux/pyspider
  container_name: cash-scheduler
  external_links:
    - 'cash-mysql:mysql'
    - 'cash-redis:redis'
  command: '--taskdb "mysql+taskdb://root:1@192.168.159.166:3306/taskdb"  --projectdb "mysql+projectdb://root:1@192.168.159.166:3306/projectdb" --resultdb "mysql+resultdb://root:1@192.168.159.166:3306/resultdb" --message-queue "redis://192.168.159.166:6379/1" scheduler --inqueue-limit 5000 --delete-time 43200'
  ports:
    - "23333:23333"  
  restart: always
  

192.168.159.166为机器A的内网IP

机器 B docker-compose scale phantomjs=2 processor=2 webui=1 phantomjs-lb=1 fetcher=1 fetcher-lb=1 result-worker=1 webui-lb=1 nginx=1.容器运行正常
docker-compose.yml文件如下:

 phantomjs:
  image: 'binux/pyspider:latest'
  command: phantomjs
  cpu_shares: 512
  environment:
    - 'EXCLUDE_PORTS=5000,23333,24444'
  expose:
    - '25555'
  mem_limit: 512m
  restart: always
phantomjs-lb:
  image: 'dockercloud/haproxy:latest'
  links:
    - phantomjs
  restart: always
  
fetcher:
  image: 'binux/pyspider:latest'
  command: '--message-queue "redis://192.168.159.166:6379/1" --phantomjs-proxy "phantomjs:80" fetcher --xmlrpc'
  cpu_shares: 512
  environment:
    - 'EXCLUDE_PORTS=5000,25555,23333'
  links:
    - 'phantomjs-lb:phantomjs'
  mem_limit: 128m
  restart: always
fetcher-lb:
  image: 'dockercloud/haproxy:latest'
  links:
    - fetcher
  restart: always
  
processor:
  image: 'binux/pyspider:latest'
  command: '--projectdb "mysql+projectdb://root:1@192.168.159.166/projectdb" --message-queue "redis://192.168.159.166:6379/1" processor'
  cpu_shares: 512
  mem_limit: 256m
  restart: always
  
result-worker:
  image: 'binux/pyspider:latest'
  command: '--taskdb "mysql+taskdb://root:1@192.168.159.166/taskdb"  --projectdb "mysql+projectdb://root:1@192.168.159.166/projectdb" --resultdb "mysql+resultdb://root:1@192.168.159.166/resultdb" --message-queue "redis://192.168.159.166:6379/1" result_worker'
  cpu_shares: 512
  mem_limit: 256m
  restart: always
  
webui:
  image: 'binux/pyspider:latest'
  container_name: cash-webui
  command: '--taskdb "mysql+taskdb://root:1@192.168.159.166/taskdb"  --projectdb "mysql+projectdb://root:1@192.168.159.166/projectdb" --resultdb "mysql+resultdb://root:1@192.168.159.166/resultdb" --message-queue "redis://192.168.159.166:6379/1" webui --max-rate 0.5 --max-burst 3 --scheduler-rpc "http://192.168.159.166:23333/" --fetcher-rpc "http://fetcher/"'
  cpu_shares: 512
  environment:
    - 'EXCLUDE_PORTS=24444,25555,23333'
  links:
    - 'fetcher-lb:fetcher'
  mem_limit: 256m
  restart: always

webui-lb:
  image: 'dockercloud/haproxy:latest'
  links:
    - webui
  restart: always
  
nginx:
  image: 'nginx:latest'
  container_name: cash-nginx
  links:
    - 'webui-lb:HAPROXY'
  ports:
    - '0.0.0.0:80:80'
  volumes:
    - /docker/conf/nginx/nginx.conf:/etc/nginx/nginx.conf
    - /docker/conf/nginx/conf.d/:/etc/nginx/conf.d/
  restart: always
  
  机器B内网IP为192.168.159.174,可以ping 192.168.159.166成功

在机器B运行脚本:

from pyspider.libs.base_handler import *


class Handler(BaseHandler):
    crawl_config = {
    }

    @every(minutes=24 * 60)
    def on_start(self):
        self.crawl('http://blog.binux.me/', callback=self.index_page)

    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        for each in response.doc('a[href^="http"]').items():
            self.crawl(each.attr.href, callback=self.detail_page)

    @config(priority=2)
    def detail_page(self, response):
        return {
            "url": response.url,
            "title": response.doc('title').text(),
        }

错误提示:

Traceback (most recent call last):
  File "/opt/pyspider/pyspider/run.py", line 345, in <lambda>
    app.config['fetch'] = lambda x: umsgpack.unpackb(fetcher_rpc.fetch(x).data)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1301, in single_request
    self.send_content(h, request_body)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1448, in send_content
    connection.endheaders(request_body)
  File "/usr/lib/python2.7/httplib.py", line 962, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 822, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 784, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 765, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 553, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno -2] Name or service not known 

机器A和B系统均为ubuntu 15.10. interfaces文件中都已添加内网IP。
google了gaierror: [Errno -2] Name or service not known
未找到答案。

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦回旧景 2022-09-10 09:12:15

docker exec -it fetcher /bin/bash 进入查看 DNS 是否正常

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文