剧作家:通过印刷下载到PDF?
我正在寻求使用剧作家刮擦网页。
我加载页面,然后成功单击“下载”按钮。这将提出一个打印对话框,并选择了打印机。
我想选择“另存为pdf”,然后单击“保存”按钮。
这是我当前的代码:
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
playwright_page = browser.new_page()
got_error = False
try:
playwright_page.goto(url_to_start_from)
print(playwright_page.title())
html = playwright_page.content()
except Exception as e:
print(f"Playwright exception: {e}")
got_error = True
if not got_error:
soup = BeautifulSoup(html, 'html.parser')
#download pdf
with playwright_page.expect_download() as download_info:
playwright_page.locator("text=download").click()
download = download_info.value
path = download.path()
download.save_as(DOWNLOADED_PDF_FOLDER)
browser.close()
有没有剧作家有办法做到这一点?
I'm seeking to scrape a web page using Playwright.
I load the page, and click the download button with Playwright successfully. This brings up a print dialog box with a printer selected.
I would like to select "Save as PDF" and then click the "Save" button.
Here's my current code:
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
playwright_page = browser.new_page()
got_error = False
try:
playwright_page.goto(url_to_start_from)
print(playwright_page.title())
html = playwright_page.content()
except Exception as e:
print(f"Playwright exception: {e}")
got_error = True
if not got_error:
soup = BeautifulSoup(html, 'html.parser')
#download pdf
with playwright_page.expect_download() as download_info:
playwright_page.locator("text=download").click()
download = download_info.value
path = download.path()
download.save_as(DOWNLOADED_PDF_FOLDER)
browser.close()
Is there a way to do this using Playwright?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您实际上不需要打印对话框,可以通过模拟媒体类型直接从剧作家生成它。
这就是我生成简历的方式。
另请参阅:
playwright-媒体仿真
playwright-pdf
You don't actually need the print dialog, you can generate this directly from Playwright by emulating the media type.
This is how I generate my CV.
See also:
Playwright - Media Emulation
Playwright - PDF
非常感谢@kj在评论中,他建议使用
headless = true
,Chromium甚至不会首先放置一个打印对话框。Thanks very much to @KJ in the comments, who suggested that with
headless=True
, Chromium won't even put up a print dialog box in the first place.