无法在Docker上本地运行脚本/软件包

发布于 2025-01-21 23:52:38 字数 2998 浏览 0 评论 0原文

我正在创建一个Web Craper软件包,并正在上传到Docker。虽然我可以构建到本地的Docker存储库,但如果没有以下错误出现以下错误,我无法运行脚本:

selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
  (unknown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

这是我到目前为止的主要脚本中所拥有的,可以尝试在Docker上运行它:

def __init__(self, url: str = " url goes here ", 
                options: Optional[ChromeOptions] = None): #default url    
        
        options = ChromeOptions()
        self.driver = Chrome(ChromeDriverManager().install(), options=options) 
        options.add_argument("--no-sandbox") 
        options.binary_location = '/usr/bin/google-chrome'
        options.add_argument("--headless")
        
        options.add_argument("--disable-dev-shm-usage")
        options.add_argument("--disable-setuid-sandbox") 
        options.add_argument("--remote-debugging-port=9222") 
         
        options.add_argument("start-maximized")
        options.add_argument('--disable-gpu')
        
        options.add_argument("window-size=1920,1080") 

从其他帖子中我注意到,对于某些IT来说,就像更改options.add_argument的顺序一样简单,我已经尝试过,但发现它对我不起作用。

我在同一脚本中也有以下模块:

import os
import selenium
from selenium.webdriver import Chrome
from webdriver_manager.chrome import ChromeDriverManager #installs Chrome webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from selenium.webdriver import ChromeOptions
from selenium.webdriver.chrome.service import Service
from typing import Optional
import time
import boto3
from sqlalchemy import create_engine
import urllib.request
import tempfile #temporary directory - to be removed after all operations have finished

在我的Dockerfile中:

FROM python:3.8
    
    #Set Chrome Repo
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -\
    && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'\
    && apt-get -y update\
    #Install Chrome
    && apt-get install -y google-chrome-stable\
    && wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip\
    && apt-get install -yqq unzip\
    && unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
    

COPY . .

RUN pip install -r requirements.txt

#When we run the container, this will be the command run
CMD ["python", "scraper/webscraper.py"]

以防万一我在Windows OS上使用Docker和Vscode。

I'm creating a webscraper package and am in the process of uploading to Docker. Whilst I can build to the local Docker repository, I cannot run the script without the following errors appearing:

selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
  (unknown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

Here is what I have in the main script so far to try and get it running on Docker:

def __init__(self, url: str = " url goes here ", 
                options: Optional[ChromeOptions] = None): #default url    
        
        options = ChromeOptions()
        self.driver = Chrome(ChromeDriverManager().install(), options=options) 
        options.add_argument("--no-sandbox") 
        options.binary_location = '/usr/bin/google-chrome'
        options.add_argument("--headless")
        
        options.add_argument("--disable-dev-shm-usage")
        options.add_argument("--disable-setuid-sandbox") 
        options.add_argument("--remote-debugging-port=9222") 
         
        options.add_argument("start-maximized")
        options.add_argument('--disable-gpu')
        
        options.add_argument("window-size=1920,1080") 

From other posts I note that for some it was as simple as changing the order of options.add_argument, which I have tried but found it doesn't work for me.

I also have the following modules within the same script:

import os
import selenium
from selenium.webdriver import Chrome
from webdriver_manager.chrome import ChromeDriverManager #installs Chrome webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
from selenium.webdriver import ChromeOptions
from selenium.webdriver.chrome.service import Service
from typing import Optional
import time
import boto3
from sqlalchemy import create_engine
import urllib.request
import tempfile #temporary directory - to be removed after all operations have finished

In my Dockerfile:

FROM python:3.8
    
    #Set Chrome Repo
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -\
    && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'\
    && apt-get -y update\
    #Install Chrome
    && apt-get install -y google-chrome-stable\
    && wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip\
    && apt-get install -yqq unzip\
    && unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
    

COPY . .

RUN pip install -r requirements.txt

#When we run the container, this will be the command run
CMD ["python", "scraper/webscraper.py"]

Just in case, I am using Docker and VSCode on Windows OS.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

你曾走过我的故事 2025-01-28 23:52:38

我已经意识到,通过更改选项顺序。ADD_ARGUMENTS和self.Driver,该脚本运行良好。这是因为当驾驶员应与以下方式相反时,首先是创建驱动程序的。

def __init__(self, url: str = " url goes here ", 
                options: Optional[ChromeOptions] = None): #default url    
        
        options = ChromeOptions()
        
        options.add_argument("--no-sandbox") 
        options.binary_location = '/usr/bin/google-chrome'
        options.add_argument("--headless")
        
        options.add_argument("--disable-dev-shm-usage")
        options.add_argument("--disable-setuid-sandbox") 
        options.add_argument("--remote-debugging-port=9222") 
         
        options.add_argument("start-maximized")
        options.add_argument('--disable-gpu')
        
        options.add_argument("window-size=1920,1080") 

        self.driver = Chrome(ChromeDriverManager().install(), options=options)

I've realised that by changing the order of options.add_arguments and self.driver, the script will run just fine. This is because the driver is being created first when it should be the other way around as follows:

def __init__(self, url: str = " url goes here ", 
                options: Optional[ChromeOptions] = None): #default url    
        
        options = ChromeOptions()
        
        options.add_argument("--no-sandbox") 
        options.binary_location = '/usr/bin/google-chrome'
        options.add_argument("--headless")
        
        options.add_argument("--disable-dev-shm-usage")
        options.add_argument("--disable-setuid-sandbox") 
        options.add_argument("--remote-debugging-port=9222") 
         
        options.add_argument("start-maximized")
        options.add_argument('--disable-gpu')
        
        options.add_argument("window-size=1920,1080") 

        self.driver = Chrome(ChromeDriverManager().install(), options=options)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文