在docker中运行headless google chrome将html转换为pdf

发布于 2025-01-18 18:41:59 字数 1763 浏览 3 评论 0原文

我正在尝试将HTML转换为Docker容器中的PDF。 Dockerfile:

FROM python:3.8

# Adding trusting keys to apt for repositories
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -

# Adding Google Chrome to the repositories
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'

# Updating apt to see and install Google Chrome
RUN apt-get -y update

# Magic happens
RUN apt-get install -y google-chrome-stable

COPY /requirements.txt /app/requirements.txt

WORKDIR /app

RUN pip3 install -r requirements.txt

COPY . /app

CMD [ "python3", "app.py" ]

App.py

html_file_url="file:///html_file_name.html"
pdf_file_path="pdf_file_name.pdf"
commands = [
            "google-chrome",
            "--headless",
            "--disable-gpu",
            "--no-sandbox",
            "--print-to-pdf={}".format(pdf_file_path),
            html_file_url,
        ]
        subprocess.run(commands)

运行我要获得的Docker文件时:

[0404/112836.835631:ERROR:bus.cc(397)] Failed to connect to the bus: Failed to connect to socket /host/run/dbus/system_bus_socket: No such file or directory
[0404/112836.835729:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable.
[0404/112836.835786:ERROR:bus.cc(397)] Failed to connect to the bus: Failed to connect to socket /host/run/dbus/system_bus_socket: No such file or directory
[0404/112836.866694:ERROR:sandbox_linux.cc(377)] InitializeSandbox() called with multiple threads in process gpu-process.

生成的PDF为空。 HTML格式包含诸如Flexbox之类的CSS功能,并且不会通过python软件包进行转换,例如XHTML2PDF,PDFKIT等,因此我正在尝试使用Google Chrome headless。

I'm trying to convert htmls to pdfs in a docker container.
Dockerfile:

FROM python:3.8

# Adding trusting keys to apt for repositories
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -

# Adding Google Chrome to the repositories
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'

# Updating apt to see and install Google Chrome
RUN apt-get -y update

# Magic happens
RUN apt-get install -y google-chrome-stable

COPY /requirements.txt /app/requirements.txt

WORKDIR /app

RUN pip3 install -r requirements.txt

COPY . /app

CMD [ "python3", "app.py" ]

app.py

html_file_url="file:///html_file_name.html"
pdf_file_path="pdf_file_name.pdf"
commands = [
            "google-chrome",
            "--headless",
            "--disable-gpu",
            "--no-sandbox",
            "--print-to-pdf={}".format(pdf_file_path),
            html_file_url,
        ]
        subprocess.run(commands)

On running the docker file i'm getting:

[0404/112836.835631:ERROR:bus.cc(397)] Failed to connect to the bus: Failed to connect to socket /host/run/dbus/system_bus_socket: No such file or directory
[0404/112836.835729:WARNING:bluez_dbus_manager.cc(248)] Floss manager not present, cannot set Floss enable/disable.
[0404/112836.835786:ERROR:bus.cc(397)] Failed to connect to the bus: Failed to connect to socket /host/run/dbus/system_bus_socket: No such file or directory
[0404/112836.866694:ERROR:sandbox_linux.cc(377)] InitializeSandbox() called with multiple threads in process gpu-process.

The pdf generated is empty. The html format contains recent css features like flexbox and is not being convert via python packages like xhtml2pdf, pdfkit etc so I'm trying to use google chrome headless.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

转瞬即逝 2025-01-25 18:41:59

如果我错了,请原谅我,但是他们不仅会在几个评论中丢失,因此在这里收集了有关OP问题提出的几个问题的相关答案的片段。

    1. debian没有本机Chrome软件包,因此需要特定的下载才能安装,目前是最新10/11系统
      WGET https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
      但是您应该研究自己当前的Google/debian版本状态,最新的.rpm或.deb可从 https://www.google.com/chrome/?platform=linux
    1. 非常旧的命令“ - disable-gpu”是Windows Systems 5年前的最后使用,不应在现代中不需要“ - 无头”勇敢/铬/铬/边缘。
    1. Chrome重写了一些无头命令,因为OP问题,因此当前组合可能与- headless = new
    2. 有所不同

    1. - 无头可以是关于输入和输出文件位置/访问权限/写入的特殊pecation脚的,并且只需在没有太多警告的情况下画一个空白,如果两个中的一个是错误的,或者默认尝试写入个人资料文件夹。

Forgive me If I am wrong but so they are not simply lost in several comments, here are collected snippets of related answers to several issues raised by OP question.

    1. Debian did not have a native chrome package so need specific download to get installed currently that's for a recent 10/11 system
      wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
      but you should research your own current google/debian version status, and latest .rpm or .deb is available from https://www.google.com/chrome/?platform=linux
    1. the very old command "--disable-gpu" was last used by windows systems 5 years ago and should not be needed in a modern "--headless" Brave/Chrome/Chromium/Edge.
    1. Chrome have rewritten some headless commands since OP question so current combinations may be different with --headless=new
    1. --headless can be exceptionally pedantic about input and output file locations/access rights/writes, and will simply draw a blank without much warning, if either one of two is wrong, or a default write to profile folder was attempted.
一个人练习一个人 2025-01-25 18:41:59

由于可能需要一些其他二进制文件(,对于不同的Linux口味和镀铬版本)。

考虑查看并用作镀铬的一些(流行)图像,例如
dockerfile

FROM node:18

# Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)
# Note: this installs the necessary libs to make the bundled version of Chromium that Puppeteer
# installs, work.
RUN apt-get update \
    && apt-get install -y wget gnupg \
    && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | gpg --dearmor -o /usr/share/keyrings/googlechrome-linux-keyring.gpg \
    && sh -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/googlechrome-linux-keyring.gpg] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
    && apt-get update \
    && apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-khmeros fonts-kacst fonts-freefont-ttf libxss1 \
      --no-install-recommends \
    && rm -rf /var/lib/apt/lists/* \
    && groupadd -r pptruser && useradd -rm -g pptruser -G audio,video pptruser

USER pptruser

WORKDIR /home/pptruser

这样,您可以从其他开源项目的结果中学习。
包括最好的想法是,最好的运行命令较少,而在结果图像中却更少。

Building docker image with Chrome may be tricky as some additional binaries may be required (and different for different linux flavors and Chrome versions).

Consider looking at and using as base some (popular) images with Chrome, e.g.
https://github.com/puppeteer/puppeteer/blob/main/docker/Dockerfile

FROM node:18

# Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)
# Note: this installs the necessary libs to make the bundled version of Chromium that Puppeteer
# installs, work.
RUN apt-get update \
    && apt-get install -y wget gnupg \
    && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | gpg --dearmor -o /usr/share/keyrings/googlechrome-linux-keyring.gpg \
    && sh -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/googlechrome-linux-keyring.gpg] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
    && apt-get update \
    && apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-khmeros fonts-kacst fonts-freefont-ttf libxss1 \
      --no-install-recommends \
    && rm -rf /var/lib/apt/lists/* \
    && groupadd -r pptruser && useradd -rm -g pptruser -G audio,video pptruser

USER pptruser

WORKDIR /home/pptruser

This way you can learn from results of other open source projects.
Including the idea that it is better to have less RUN commands, and so less layers in the resulted image.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文