如何设置 Playwright 不自动跟随重定向？

发布于 2025-01-12 17:38:05 字数 1088 浏览 0 评论 0原文

我想使用 Playwright 打开一个网站，但我不想被自动重定向。

在其他一些 Web 客户端中，它们具有参数链接 follow=False 来禁用自动跟随重定向。但我在剧作家上找不到它。

async def run(playwright):
    chromium = playwright.chromium
    browser = await chromium.launch()
    context = await browser.new_context()
    page = await context.new_page()

    def handle_response(response):
        print(f'status: {response.status} {response.url}')
    page.on('response', handle_response)

    await page.goto("https://google.com")
    await browser.close()

这是示例代码，我们知道 google.com 会响应 301 并将重定向到 www.google.com。是否可以在收到 301 后停止该进程，这样我就不需要继续处理 www.google.com 以及之后的所有回应？

从请求文档中，我得到了 Page.on('response ') 当/如果收到请求的响应状态和标头时发出。

但是当Page on('response')回调后如何停止Request呢？我看到其他一些类似的问题，使用 Route.abort() 或 Route.fulfill()，但我仍然没有得到我的案例的答案。

感谢您的帮助。

原文

I want to open a website using Playwright,
but I don't want to be automatically redirected.

In some other web clients, they have parameter link follow=False to disable automatically following the redirection. But I can't find it on Playwright.

async def run(playwright):
    chromium = playwright.chromium
    browser = await chromium.launch()
    context = await browser.new_context()
    page = await context.new_page()

    def handle_response(response):
        print(f'status: {response.status} {response.url}')
    page.on('response', handle_response)

    await page.goto("https://google.com")
    await browser.close()

that's the sample code, as we know google.com would respond 301 and will be redirected to www.google.com.
Is it possible to stop the process after I got the 301, so I don't need to continue processing www.google.com and all the responses after that?

From the Request documentation, I got that Page.on('response') emitted when/if the response status and headers are received for the request.

But how to stop the Request after the Page got on('response') callback?
I saw some other questions similar, using Route.abort() or Route.fulfill(), but I still don't get the answer for my case.

thank you for your help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

孤千羽 2025-01-19 17:38:05

虽然您可以使用 Vishal 概述的解决方案来处理通用 API 请求，但在浏览器上下文中工作并使用 page.goto() 时，事情就没那么容易了。
2019 年已关闭的问题中找到了提示，并整理了一个可行的解决方案：

await page.route('**', async route => {
    const response = await route.fetch({maxRedirects: 0});
    let headers = response.headers();
    delete headers['location'];
    delete headers['Location'];
    return route.fulfill({
        response: response,
        headers: headers
    });
});
response = await page.goto(req.query.url, {
    timeout: timeout,
    waitUntil: 'domcontentloaded'
});

我在这个解决方案， page.route() 被使用配置如何处理 page.goto() 发出的请求。首先，我们告诉浏览器不要遵循初始重定向：route.fetch({maxRedirects: 0})。此外，由于route.fulfill()似乎遵循位置标头指示的重定向，因此我们从响应中删除任何位置标头，然后将所有内容传递给fulfill()函数。

注意：如果网站引用图像或 CSS 文件等资产（这些资产本身已重定向），则可能会破坏网站的渲染。您可以根据您的用例调整 page.route() 部分。

While you might use the solution outlined by Vishal for generic API requests, it's not that easy while working in the browser context and using page.goto().
I found a hint in a closed issue from 2019 and put together a working solution:

await page.route('**', async route => {
    const response = await route.fetch({maxRedirects: 0});
    let headers = response.headers();
    delete headers['location'];
    delete headers['Location'];
    return route.fulfill({
        response: response,
        headers: headers
    });
});
response = await page.goto(req.query.url, {
    timeout: timeout,
    waitUntil: 'domcontentloaded'
});

In this solution, page.route() is used to configure how requests made by page.goto() should be handled. First, we tell the browser to not follow initial redirects: route.fetch({maxRedirects: 0}). Additionally, as route.fulfill() seems to follow redirects instructed by location headers, we remove any location header from the response and then pass everything to the fulfill() function.

Caution: this may break rending of websites if the website references assets such as images or CSS files, which itself are redirected. You may tweak the page.route() part to your use case.

回复收藏 0 原文

~没有更多了~