将来自URL的多个Puppeteer PDF缓冲区合并为单个PDF文件,然后返回给用户

发布于 2025-02-08 07:09:35 字数 2287 浏览 1 评论 0原文

我正在尝试合并多个无限数量的PDF缓冲区,从木偶器到一个文件。我怀疑这与缓冲区有关,但是我还没有找到似乎有用的解决方案。这使我成为最接近的我,如何输出PDF Buffer要使用nodejs浏览器?,但是输出仍然说它无法加载。 Adobe,Chrome和Fox-it都说它是损坏或解码错误的..

木偶代码:

async function generateBulkPDFFromUrl(urlString) {
  // launch a new chrome instance
  const browser = await puppeteer.launch({
      headless: true,
      args: ['--font-render-hinting=none']
  });

  // create a new page
  const page = await browser.newPage();
  await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) 
       AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36');

  // set your html as the pages content
  const url = new URL(`${urlString}`);

  await page.goto(url, { waitUntil: 'domcontentloaded' });

  // create a pdf buffer
  const pdfBuffer = await page.pdf({
       format: 'A4'
  });

  // close the browser
  await browser.close();

  return pdfBuffer;
}

来自Puppetteer Incase的buffer您想看到它:

<Buffer 25 50 44 46 2d 31 2e 34 0a 25 d3 eb e9 e1 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 43 72 65 61 74 6f 72 20 28 43 68 72 6f 6d 69 75 6d 29 0a 2f 50 72 6f 64 ... 52960 more bytes>

发送到合并代码的缓冲区数组:

[
  <Buffer 25 50 44 46 2d 31 2e 34 0a 25 d3 eb e9 e1 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 43 72 65 61 74 6f 72 20 28 43 68 72 6f 6d 69 75 6d 29 0a 2f 50 72 6f 64 ... 52960 more bytes>,
  <Buffer 25 50 44 46 2d 31 2e 34 0a 25 d3 eb e9 e1 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 43 72 65 61 74 6f 72 20 28 43 68 72 6f 6d 69 75 6d 29 0a 2f 50 72 6f 64 ... 93378 more bytes>
]

Merge Code(PDF-LIB):

async function mergePdfs(pdfsToMerges) {
    const mergedPdf = await PDFDocument.create();
    const actions = pdfsToMerges.map(async pdfBuffer => {

    const pdf = await PDFDocument.load(pdfBuffer);
    const copiedPages = await mergedPdf.copyPages(pdf, pdf.getPageIndices());

        copiedPages.forEach((page) => {
           mergedPdf.addPage(page);
        });
    });

 await Promise.all(actions);

  return await mergedPdf.save();
}

我可以得到一个PDF要下载罚款,这似乎是问题。任何洞察力都会有所帮助。谢谢。

I am trying to merge multiple, infinite amount, of pdf buffers from puppeteer to a single file. I suspect it has something to do with the buffer, but I have yet to find a solution that seems to work. This got me the closest, How to output a PDF buffer to browser using NodeJS?, but the output still says it's unable to load. Adobe, Chrome, and Fox-IT all say it's corrupt or decoded incorrectly..

Puppeteer code:

async function generateBulkPDFFromUrl(urlString) {
  // launch a new chrome instance
  const browser = await puppeteer.launch({
      headless: true,
      args: ['--font-render-hinting=none']
  });

  // create a new page
  const page = await browser.newPage();
  await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) 
       AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36');

  // set your html as the pages content
  const url = new URL(`${urlString}`);

  await page.goto(url, { waitUntil: 'domcontentloaded' });

  // create a pdf buffer
  const pdfBuffer = await page.pdf({
       format: 'A4'
  });

  // close the browser
  await browser.close();

  return pdfBuffer;
}

Buffer from Puppetteer incase you wanna see it:

<Buffer 25 50 44 46 2d 31 2e 34 0a 25 d3 eb e9 e1 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 43 72 65 61 74 6f 72 20 28 43 68 72 6f 6d 69 75 6d 29 0a 2f 50 72 6f 64 ... 52960 more bytes>

Array of Buffers sent to Merge Code:

[
  <Buffer 25 50 44 46 2d 31 2e 34 0a 25 d3 eb e9 e1 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 43 72 65 61 74 6f 72 20 28 43 68 72 6f 6d 69 75 6d 29 0a 2f 50 72 6f 64 ... 52960 more bytes>,
  <Buffer 25 50 44 46 2d 31 2e 34 0a 25 d3 eb e9 e1 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 43 72 65 61 74 6f 72 20 28 43 68 72 6f 6d 69 75 6d 29 0a 2f 50 72 6f 64 ... 93378 more bytes>
]

Merge Code (PDF-LIB):

async function mergePdfs(pdfsToMerges) {
    const mergedPdf = await PDFDocument.create();
    const actions = pdfsToMerges.map(async pdfBuffer => {

    const pdf = await PDFDocument.load(pdfBuffer);
    const copiedPages = await mergedPdf.copyPages(pdf, pdf.getPageIndices());

        copiedPages.forEach((page) => {
           mergedPdf.addPage(page);
        });
    });

 await Promise.all(actions);

  return await mergedPdf.save();
}

I can get a single pdf to download fine, it's the merge that seems to be the issue. Any insight would be helpful. Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

内心激荡 2025-02-15 07:09:35

因此,我说这是一个缓冲问题是正确的。我不确定pdf-lib能否正确处理木偶的缓冲区...如果有人有其他解决方案,我全都是手指,好眼睛..

这是我要工作的东西,在这里找到href =“ https://npm.io/package/merge-pdf-buffers” rel =“ nofollow noreferrer”> https://npm.io/package/merge-package/merge-pdf-buffers

似乎有点简单,有点简单,有点简单,而且我不确定这个人仍在支持它。因此,我可以将其拉下来,看看它在做什么。也许我会支持它。大声笑

,但是目前此解决方案正在起作用。

So I was correct in saying it was a buffer issue. I'm not sure PDF-LIB can handle the buffers from puppeteer correctly... if anyone has any other solution I'm all ears.. er fingers, well eyes..

Here is what I got to work, found here https://npm.io/package/merge-pdf-buffers

Seems a bit simplistic, and I'm not sure that this person is still supporting it.. so I may pull it down and see what it's doing. Maybe I'll support it. lol

However, at this time this solution is working.

喜爱皱眉﹌ 2025-02-15 07:09:35

区别在于节点和JSbuffer。您可以在此线程上检查解决方案

convert

示例:

function toBuffer(arrayBuffer) {
    const buffer = Buffer.alloc(arrayBuffer.byteLength);
    const view = new Uint8Array(arrayBuffer);
    for (let i = 0; i < buffer.length; ++i) {
        buffer[i] = view[i];
    }
    return buffer;
}


async function mergePdfs(pdfsToMerges) {
    const mergedPdf = await PDFDocument.create();
    const actions = pdfsToMerges.map(async pdfBuffer => {

    const pdf = await PDFDocument.load(pdfBuffer);
    const copiedPages = await mergedPdf.copyPages(pdf, pdf.getPageIndices());

        copiedPages.forEach((page) => {
           mergedPdf.addPage(page);
        });
    });

 await Promise.all(actions);

const buf = await mergedPdf.save();
return toBuffer(buf)
}

The difference is between the nodebuffer and JSBuffer. You can check the solution on this thread

Convert a binary NodeJS Buffer to JavaScript ArrayBuffer

example :

function toBuffer(arrayBuffer) {
    const buffer = Buffer.alloc(arrayBuffer.byteLength);
    const view = new Uint8Array(arrayBuffer);
    for (let i = 0; i < buffer.length; ++i) {
        buffer[i] = view[i];
    }
    return buffer;
}


async function mergePdfs(pdfsToMerges) {
    const mergedPdf = await PDFDocument.create();
    const actions = pdfsToMerges.map(async pdfBuffer => {

    const pdf = await PDFDocument.load(pdfBuffer);
    const copiedPages = await mergedPdf.copyPages(pdf, pdf.getPageIndices());

        copiedPages.forEach((page) => {
           mergedPdf.addPage(page);
        });
    });

 await Promise.all(actions);

const buf = await mergedPdf.save();
return toBuffer(buf)
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文