可以使用剧作铬或webkit下载文件

发布于 2025-02-11 02:23:35 字数 2665 浏览 1 评论 0原文

我想下载一些文件,例如 sitemap.xml.gz
我只想只用 playwright 1.22 。 我试图用铬浏览器进行此操作,但失败
另外,它不适用于 webkit 。使用webkit,它将打开页面上的所有文件内容,并给我 timeout
它仅适用于 firefox
但是我想知道其他浏览器是错误的吗?也许是剧作家中的一些错误
有人能够直接下载与剧作家的文件吗?

public class PwDownload {
    public static void main(String[] args) {
        try (Playwright playwright = Playwright.create()) {
            final BrowserType chromium = playwright.chromium();
            final Browser browser = chromium.launch(new BrowserType.LaunchOptions().setHeadless(false));
            Page page = browser.newPage();
            Download download = page.waitForDownload(() -> {
                page.navigate("https://www.fnac.es/sitemap-top-post.xml.gz");
            });
            System.out.println(download.path());
            browser.close();
        }
    }
}

铬的错误跟踪:

navigating to "https://www.fnac.es/sitemap-top-post.xml.gz", waiting until "load"
============================================================
    at FrameSession._navigate (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/chromium/crPage.js:636:35)
    at runNextTicks (node:internal/process/task_queues:61:5)
    at processImmediate (node:internal/timers:437:9)
    at async /private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/frames.js:648:30
    at async ProgressController.run (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/progress.js:101:22)
    at async FrameDispatcher.goto (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/dispatchers/frameDispatcher.js:86:59)
    at async DispatcherConnection.dispatch (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/dispatchers/dispatcher.js:352:22)
}
    at com.microsoft.playwright.impl.Connection.dispatch(Connection.java:183)
    at com.microsoft.playwright.impl.Connection.processOneMessage(Connection.java:163)
    at com.microsoft.playwright.impl.ChannelOwner.runUntil(ChannelOwner.java:101)
    ... 19 more

I want to download some file for example sitemap.xml.gz.
I want to do it only with playwright 1.22.
I tried to do it with chromium browser, but it fails.
Also it doesn't work with webkit. With webkit it opens all file content on the page and gives me timeout.
It only works with firefox.
But I want to know that is wrong with others browsers? Maybe it is some bug in playwright.
Has anyone been able to download directly a file with playwright?

public class PwDownload {
    public static void main(String[] args) {
        try (Playwright playwright = Playwright.create()) {
            final BrowserType chromium = playwright.chromium();
            final Browser browser = chromium.launch(new BrowserType.LaunchOptions().setHeadless(false));
            Page page = browser.newPage();
            Download download = page.waitForDownload(() -> {
                page.navigate("https://www.fnac.es/sitemap-top-post.xml.gz");
            });
            System.out.println(download.path());
            browser.close();
        }
    }
}

Error trace with chromium:

navigating to "https://www.fnac.es/sitemap-top-post.xml.gz", waiting until "load"
============================================================
    at FrameSession._navigate (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/chromium/crPage.js:636:35)
    at runNextTicks (node:internal/process/task_queues:61:5)
    at processImmediate (node:internal/timers:437:9)
    at async /private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/frames.js:648:30
    at async ProgressController.run (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/progress.js:101:22)
    at async FrameDispatcher.goto (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/dispatchers/frameDispatcher.js:86:59)
    at async DispatcherConnection.dispatch (/private/var/folders/1n/tnrrsqrs1_7f_k2x84xql3pr0000gn/T/playwright-java-12123589937209992525/package/lib/server/dispatchers/dispatcher.js:352:22)
}
    at com.microsoft.playwright.impl.Connection.dispatch(Connection.java:183)
    at com.microsoft.playwright.impl.Connection.processOneMessage(Connection.java:163)
    at com.microsoft.playwright.impl.ChannelOwner.runUntil(ChannelOwner.java:101)
    ... 19 more

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

为铬和Firefox工作。更改outputDirectory在运行前变量。

import com.microsoft.playwright.*;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.FilenameUtils;

import java.io.File;
import java.nio.file.Path;
import java.nio.file.Paths;

public class Main {
    public static void main(String[] args) throws Exception {

        try (Playwright playwright = Playwright.create()) {
            String outputDirectory = "d:\\";

            String url = "https://www.johnlewis.com/sitemap/products/products-00.xml.gz";
            String filename = FilenameUtils.getName(url);

            BrowserType browserType = playwright.firefox();
            Browser browser = browserType.launch(new BrowserType.LaunchOptions().setHeadless(false));
            BrowserContext newContext = browser.newContext(new Browser.NewContextOptions().setAcceptDownloads(true));
            Page page = newContext.newPage();

            Download download = page.waitForDownload(() -> {
                page.evaluate("(y) => {location.href = y;}", url);
            });

            Path downloadedFilePath = download.path();
            System.out.println("Downloaded to " + downloadedFilePath);

            Path destinationFilePath = Paths.get(outputDirectory, filename);
            FileUtils.copyFile(new File(downloadedFilePath.toString()), new File(destinationFilePath.toString()));
            System.out.println("Saved to " + destinationFilePath);
        }
    }
}

至于WebKit,我想您不能覆盖某种内置的浏览器功能。您甚至可以尝试使用playwright的Java代码打开webkit,然后手动插入链接并尝试下载 - 它不允许您执行此操作(即使在单独的窗口中,甚至使用JavaScript + <<代码>下载 html属性)

Works for Chromium and Firefox. Change outputDirectory variable before running.

import com.microsoft.playwright.*;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.FilenameUtils;

import java.io.File;
import java.nio.file.Path;
import java.nio.file.Paths;

public class Main {
    public static void main(String[] args) throws Exception {

        try (Playwright playwright = Playwright.create()) {
            String outputDirectory = "d:\\";

            String url = "https://www.johnlewis.com/sitemap/products/products-00.xml.gz";
            String filename = FilenameUtils.getName(url);

            BrowserType browserType = playwright.firefox();
            Browser browser = browserType.launch(new BrowserType.LaunchOptions().setHeadless(false));
            BrowserContext newContext = browser.newContext(new Browser.NewContextOptions().setAcceptDownloads(true));
            Page page = newContext.newPage();

            Download download = page.waitForDownload(() -> {
                page.evaluate("(y) => {location.href = y;}", url);
            });

            Path downloadedFilePath = download.path();
            System.out.println("Downloaded to " + downloadedFilePath);

            Path destinationFilePath = Paths.get(outputDirectory, filename);
            FileUtils.copyFile(new File(downloadedFilePath.toString()), new File(destinationFilePath.toString()));
            System.out.println("Saved to " + destinationFilePath);
        }
    }
}

As for webkit, I guess there is some kind of built in browser functionality you cannot override. You can even try to open webkit using Playwright's java code and then insert a link manually and try to download - it doesn't allow you to do this (even in separate window, or even using javascript + download html attribute)

青萝楚歌 2025-02-18 02:23:35

如果您已经有了指向文件的链接,那么为什么不只是通过普通连接下载它? 使用Java 从https服务器下载文件

playwright下载就像从下载触发的下载一样,页面或用户。仅访问链接不会触发下载事件。

If you already have a link to your file, then why you don't just download it via a normal connection? Download file from HTTPS server using Java

Playwright Downloads are like downloads triggered from a page or by a user. Just visiting a link isn't going to trigger a download event.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文