使用Puppeteer下载PDF文件表单嵌入式标签

发布于 2025-01-25 23:21:27 字数 1654 浏览 4 评论 0原文

我正在尝试从网站下载PDF。

该网站是使用框架ZK制成的，它在输入栏中的ID号类型时揭示了PDF的动态URL。此步骤非常容易，我可以获取PDF URL，该PDF URL可以在嵌入式标签上打开浏览器。

但是，我不可能找到将文件下载到计算机的方法。几天来，我尝试并阅读了所有内容，从 /a>，to this ，to this 。

我能够通过此代码获得的关闭内容：

let [ iframe ] = await page.$x('//iframe');
let pdf_url = await page.evaluate( iframe => iframe.src, iframe)

let res = await page.evaluate( async url => 
                await fetch(url, {
                        method: 'GET',
                        credentials: 'same-origin', // usefull when we are logged into a website and want to send cookies
                        responseType: 'arraybuffer', // get response as an ArrayBuffer
                }).then(response => response.text()), 
                pdf_url 
        )
console.log('res:', res);
//const response = await page.goto(pdf);
fs.writeFileSync('somepdf.pdf', res);

这导致了一个空白的PDF文件，该文件的大小为92K。

而我要获得的文件为52k。我怀疑后端可能会向我发送“虚拟” pdf文件，因为我在提取请求上的标题可能不正确。

我还能尝试什么？

到PDF页面。

您可以使用我发现的随机ID号：'1705120630'

原文

I am trying to download a pdf from a Website.

The website is made with the framework ZK, and it reveals a dynamic URL to the PDF for a window of time when an id number type in a input bar. This step is easy enough and I a able to get the PDF URL which opens up in the browser on a embedded tag.

However, it has been impossible for me to find a way to download the file to my computer. For days, I have tried and read everything from this, to this, to this.

The closes thing I have been able to get with this code:

let [ iframe ] = await page.$x('//iframe');
let pdf_url = await page.evaluate( iframe => iframe.src, iframe)

let res = await page.evaluate( async url => 
                await fetch(url, {
                        method: 'GET',
                        credentials: 'same-origin', // usefull when we are logged into a website and want to send cookies
                        responseType: 'arraybuffer', // get response as an ArrayBuffer
                }).then(response => response.text()), 
                pdf_url 
        )
console.log('res:', res);
//const response = await page.goto(pdf);
fs.writeFileSync('somepdf.pdf', res);

This results in a blank PDF file which is of 92K in size.

While the file I am trying to get is of 52K. I suspect the back-end might be sending me 'dummy' pdf file because my headers on the fetch request might not be correct.

What else can I try?

Here is the link to the PDF page.

You can use the random ID number I found: '1705120630'

分享到QQ

分享到微博