将来自URL的多个Puppeteer PDF缓冲区合并为单个PDF文件,然后返回给用户
我正在尝试合并多个无限数量的PDF缓冲区,从木偶器到一个文件。我怀疑这与缓冲区有关,但是我还没有找到似乎有用的解决方案。这使我成为最接近的我,如何输出PDF Buffer要使用nodejs浏览器?,但是输出仍然说它无法加载。 Adobe,Chrome和Fox-it都说它是损坏或解码错误的..
木偶代码:
async function generateBulkPDFFromUrl(urlString) {
// launch a new chrome instance
const browser = await puppeteer.launch({
headless: true,
args: ['--font-render-hinting=none']
});
// create a new page
const page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36');
// set your html as the pages content
const url = new URL(`${urlString}`);
await page.goto(url, { waitUntil: 'domcontentloaded' });
// create a pdf buffer
const pdfBuffer = await page.pdf({
format: 'A4'
});
// close the browser
await browser.close();
return pdfBuffer;
}
来自Puppetteer Incase的buffer您想看到它:
<Buffer 25 50 44 46 2d 31 2e 34 0a 25 d3 eb e9 e1 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 43 72 65 61 74 6f 72 20 28 43 68 72 6f 6d 69 75 6d 29 0a 2f 50 72 6f 64 ... 52960 more bytes>
发送到合并代码的缓冲区数组:
[
<Buffer 25 50 44 46 2d 31 2e 34 0a 25 d3 eb e9 e1 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 43 72 65 61 74 6f 72 20 28 43 68 72 6f 6d 69 75 6d 29 0a 2f 50 72 6f 64 ... 52960 more bytes>,
<Buffer 25 50 44 46 2d 31 2e 34 0a 25 d3 eb e9 e1 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 43 72 65 61 74 6f 72 20 28 43 68 72 6f 6d 69 75 6d 29 0a 2f 50 72 6f 64 ... 93378 more bytes>
]
Merge Code(PDF-LIB):
async function mergePdfs(pdfsToMerges) {
const mergedPdf = await PDFDocument.create();
const actions = pdfsToMerges.map(async pdfBuffer => {
const pdf = await PDFDocument.load(pdfBuffer);
const copiedPages = await mergedPdf.copyPages(pdf, pdf.getPageIndices());
copiedPages.forEach((page) => {
mergedPdf.addPage(page);
});
});
await Promise.all(actions);
return await mergedPdf.save();
}
我可以得到一个PDF要下载罚款,这似乎是问题。任何洞察力都会有所帮助。谢谢。
I am trying to merge multiple, infinite amount, of pdf buffers from puppeteer to a single file. I suspect it has something to do with the buffer, but I have yet to find a solution that seems to work. This got me the closest, How to output a PDF buffer to browser using NodeJS?, but the output still says it's unable to load. Adobe, Chrome, and Fox-IT all say it's corrupt or decoded incorrectly..
Puppeteer code:
async function generateBulkPDFFromUrl(urlString) {
// launch a new chrome instance
const browser = await puppeteer.launch({
headless: true,
args: ['--font-render-hinting=none']
});
// create a new page
const page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36');
// set your html as the pages content
const url = new URL(`${urlString}`);
await page.goto(url, { waitUntil: 'domcontentloaded' });
// create a pdf buffer
const pdfBuffer = await page.pdf({
format: 'A4'
});
// close the browser
await browser.close();
return pdfBuffer;
}
Buffer from Puppetteer incase you wanna see it:
<Buffer 25 50 44 46 2d 31 2e 34 0a 25 d3 eb e9 e1 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 43 72 65 61 74 6f 72 20 28 43 68 72 6f 6d 69 75 6d 29 0a 2f 50 72 6f 64 ... 52960 more bytes>
Array of Buffers sent to Merge Code:
[
<Buffer 25 50 44 46 2d 31 2e 34 0a 25 d3 eb e9 e1 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 43 72 65 61 74 6f 72 20 28 43 68 72 6f 6d 69 75 6d 29 0a 2f 50 72 6f 64 ... 52960 more bytes>,
<Buffer 25 50 44 46 2d 31 2e 34 0a 25 d3 eb e9 e1 0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 43 72 65 61 74 6f 72 20 28 43 68 72 6f 6d 69 75 6d 29 0a 2f 50 72 6f 64 ... 93378 more bytes>
]
Merge Code (PDF-LIB):
async function mergePdfs(pdfsToMerges) {
const mergedPdf = await PDFDocument.create();
const actions = pdfsToMerges.map(async pdfBuffer => {
const pdf = await PDFDocument.load(pdfBuffer);
const copiedPages = await mergedPdf.copyPages(pdf, pdf.getPageIndices());
copiedPages.forEach((page) => {
mergedPdf.addPage(page);
});
});
await Promise.all(actions);
return await mergedPdf.save();
}
I can get a single pdf to download fine, it's the merge that seems to be the issue. Any insight would be helpful. Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
因此,我说这是一个缓冲问题是正确的。我不确定pdf-lib能否正确处理木偶的缓冲区...如果有人有其他解决方案,我全都是手指,好眼睛..
这是我要工作的东西,在这里找到href =“ https://npm.io/package/merge-pdf-buffers” rel =“ nofollow noreferrer”> https://npm.io/package/merge-package/merge-pdf-buffers
似乎有点简单,有点简单,有点简单,而且我不确定这个人仍在支持它。因此,我可以将其拉下来,看看它在做什么。也许我会支持它。大声笑
,但是目前此解决方案正在起作用。
So I was correct in saying it was a buffer issue. I'm not sure PDF-LIB can handle the buffers from puppeteer correctly... if anyone has any other solution I'm all ears.. er fingers, well eyes..
Here is what I got to work, found here https://npm.io/package/merge-pdf-buffers
Seems a bit simplistic, and I'm not sure that this person is still supporting it.. so I may pull it down and see what it's doing. Maybe I'll support it. lol
However, at this time this solution is working.
区别在于节点和JSbuffer。您可以在此线程上检查解决方案
convert
示例:
The difference is between the nodebuffer and JSBuffer. You can check the solution on this thread
Convert a binary NodeJS Buffer to JavaScript ArrayBuffer
example :