如何从远程存档文件中提取单个文件?

发布于 2024-09-07 08:14:06 字数 376 浏览 6 评论 0原文

给定

  1. 存档的 URL(例如 zip 文件)
  2. 该存档内文件的全名(包括路径)

我正在寻找一种方法(最好用 Java)创建该文件的本地副本,无需下载首先是整个档案

根据我(有限)的理解,这应该是可能的,尽管我不知道如何做到这一点。我一直在使用 TrueZip,因为它似乎支持多种存档类型,但我有对其以这种方式工作的能力表示怀疑。有人有此类事情的经验吗?

编辑:能够使用 tarball 和压缩 tarball 来做到这一点对我来说也很重要。

Given

  1. URL of an archive (e.g. a zip file)
  2. Full name (including path) of a file inside that archive

I'm looking for a way (preferably in Java) to create a local copy of that file, without downloading the entire archive first.

From my (limited) understanding it should be possible, though I have no idea how to do that. I've been using TrueZip, since it seems to support a large variety of archive types, but I have doubts about its ability to work in such a way. Does anyone have any experience with that sort of thing?

EDIT: being able to also do that with tarballs and zipped tarballs is also important for me.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

So要识趣 2024-09-14 08:14:06

那么,至少,您必须下载存档的部分,直到并包括您要提取的文件的压缩数据。这建议采用以下解决方案:打开存档的 URLConnection,获取其输入流,将其包装在 ZipInputStream 中,然后重复调用 getNextEntry() > 和 closeEntry() 迭代文件中的所有条目,直到到达所需的条目。然后您可以使用 ZipInputStream.read(...) 读取其数据。

Java 代码看起来像这样:

URL url = new URL("http://example.com/path/to/archive");
ZipInputStream zin = new ZipInputStream(url.getInputStream());
ZipEntry ze = zin.getNextEntry();
while (!ze.getName().equals(pathToFile)) {
    zin.closeEntry(); // not sure whether this is necessary
    ze = zin.getNextEntry();
}
byte[] bytes = new byte[ze.getSize()];
zin.read(bytes);

当然,这是未经测试的。

Well, at a minimum, you have to download the portion of the archive up to and including the compressed data of the file you want to extract. That suggests the following solution: open a URLConnection to the archive, get its input stream, wrap it in a ZipInputStream, and repeatedly call getNextEntry() and closeEntry() to iterate through all the entries in the file until you reach the one you want. Then you can read its data using ZipInputStream.read(...).

The Java code would look something like this:

URL url = new URL("http://example.com/path/to/archive");
ZipInputStream zin = new ZipInputStream(url.getInputStream());
ZipEntry ze = zin.getNextEntry();
while (!ze.getName().equals(pathToFile)) {
    zin.closeEntry(); // not sure whether this is necessary
    ze = zin.getNextEntry();
}
byte[] bytes = new byte[ze.getSize()];
zin.read(bytes);

This is, of course, untested.

挽梦忆笙歌 2024-09-14 08:14:06

与这里的其他答案相反,我想指出 ZIP 条目是单独压缩的,因此(理论上)您不需要下载目录和条目本身之外的任何内容。服务器需要支持 Range HTTP 标头才能正常工作。

标准Java API仅支持从本地文件和输入流读取ZIP文件。据我所知,没有提供从随机访问远程文件读取的功能。

由于您使用的是 TrueZip,我建议使用 Apache HTTP 客户端实现 de.schlichtherle.io.rof.ReadOnlyFile 并使用以下命令创建 de.schlichtherle.util.zip.ZipFile那。

这不会为压缩的 TAR 存档提供任何优势,因为整个存档被压缩在一起(不仅仅是使用 InputStream 并在获得条目时杀死它)。

Contrary to the other answers here, I'd like to point out that ZIP entries are compressed individually, so (in theory) you don't need to download anything more than the directory and the entry itself. The server would need to support the Range HTTP header for this to work.

The standard Java API only supports reading ZIP files from local files and input streams. As far as I know there's no provision for reading from random access remote files.

Since you're using TrueZip, I recommend implementing de.schlichtherle.io.rof.ReadOnlyFile using Apache HTTP Client and creating a de.schlichtherle.util.zip.ZipFile with that.

This won't provide any advantage for compressed TAR archives since the entire archive is compressed together (beyond just using an InputStream and killing it when you have your entry).

殤城〤 2024-09-14 08:14:06

从 TrueZIP 7.2 开始,TrueZIP 路径模块中有一个新的客户端 API。这是 JSE 7 的 NIO.2 FileSystemProvider 的实现。使用此 API,您可以访问 HTTP URI,如下所示:

Path path = new TPath(new URI("http://acme.com/download/everything.tar.gz/README.TXT"));
try (InputStream in = Files.newInputStream(path)) {
    // Read archive entry contents here.
    ...
}

Since TrueZIP 7.2, there is a new client API in the module TrueZIP Path. This is an implementation of an NIO.2 FileSystemProvider for JSE 7. Using this API, you can access HTTP URI as follows:

Path path = new TPath(new URI("http://acme.com/download/everything.tar.gz/README.TXT"));
try (InputStream in = Files.newInputStream(path)) {
    // Read archive entry contents here.
    ...
}
鸵鸟症 2024-09-14 08:14:06

我不确定是否有办法从 ZIP 中提取单个文件,而无需先下载整个文件。但是,如果您是 ZIP 文件的托管者,则可以创建一个 Java servlet 来读取 ZIP 文件并在响应中返回所请求的文件:

public class GetFileFromZIPServlet extends HttpServlet{
  @Override
  public void doGet(HttpServletRequest request, HttpServletResponse response)
  throws ServletException, IOException{
    String pathToFile = request.getParameter("pathToFile");

    byte fileBytes[];
    //get the bytes of the file from the ZIP

    //set the appropriate content type, maybe based on the file extension
    response.setContentType("...");

    //write file to the response
    response.getOutputStream().write(fileBytes);
  }
}

I'm not sure if there's a way to pull out a single file from a ZIP without downloading the whole thing first. But, if you're the one hosting the ZIP file, you could create a Java servlet which reads the ZIP file and returns the requested file in the response:

public class GetFileFromZIPServlet extends HttpServlet{
  @Override
  public void doGet(HttpServletRequest request, HttpServletResponse response)
  throws ServletException, IOException{
    String pathToFile = request.getParameter("pathToFile");

    byte fileBytes[];
    //get the bytes of the file from the ZIP

    //set the appropriate content type, maybe based on the file extension
    response.setContentType("...");

    //write file to the response
    response.getOutputStream().write(fileBytes);
  }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文