使用Puppeteer下载PDF文件表单嵌入式标签

发布于 2025-01-25 23:21:27 字数 1654 浏览 4 评论 0原文

我正在尝试从网站下载PDF。

该网站是使用框架ZK制成的,它在输入栏中的ID号类型时揭示了PDF的动态​​URL。此步骤非常容易,我可以获取PDF URL,该PDF URL可以在嵌入式标签上打开浏览器。

但是,我不可能找到将文件下载到计算机的方法。几天来,我尝试并阅读了所有内容,从 /a>,to this ,to this

我能够通过此代码获得的关闭内容:

let [ iframe ] = await page.$x('//iframe');
let pdf_url = await page.evaluate( iframe => iframe.src, iframe)

let res = await page.evaluate( async url => 
                await fetch(url, {
                        method: 'GET',
                        credentials: 'same-origin', // usefull when we are logged into a website and want to send cookies
                        responseType: 'arraybuffer', // get response as an ArrayBuffer
                }).then(response => response.text()), 
                pdf_url 
        )
console.log('res:', res);
//const response = await page.goto(pdf);
fs.writeFileSync('somepdf.pdf', res);

这导致了一个空白的PDF文件,该文件的大小为92K。

而我要获得的文件为52k。我怀疑后端可能会向我发送“虚拟” pdf文件,因为我在提取请求上的标题可能不正确。

我还能尝试什么?

到PDF页面。

您可以使用我发现的随机ID号:'1705120630'

I am trying to download a pdf from a Website.

The website is made with the framework ZK, and it reveals a dynamic URL to the PDF for a window of time when an id number type in a input bar. This step is easy enough and I a able to get the PDF URL which opens up in the browser on a embedded tag.

However, it has been impossible for me to find a way to download the file to my computer. For days, I have tried and read everything from this, to this, to this.

The closes thing I have been able to get with this code:

let [ iframe ] = await page.$x('//iframe');
let pdf_url = await page.evaluate( iframe => iframe.src, iframe)

let res = await page.evaluate( async url => 
                await fetch(url, {
                        method: 'GET',
                        credentials: 'same-origin', // usefull when we are logged into a website and want to send cookies
                        responseType: 'arraybuffer', // get response as an ArrayBuffer
                }).then(response => response.text()), 
                pdf_url 
        )
console.log('res:', res);
//const response = await page.goto(pdf);
fs.writeFileSync('somepdf.pdf', res);

This results in a blank PDF file which is of 92K in size.

While the file I am trying to get is of 52K. I suspect the back-end might be sending me 'dummy' pdf file because my headers on the fetch request might not be correct.

What else can I try?

Here is the link to the PDF page.

You can use the random ID number I found: '1705120630'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文