如何从远程存档文件中提取单个文件?
给定
- 存档的 URL(例如 zip 文件)
- 该存档内文件的全名(包括路径)
我正在寻找一种方法(最好用 Java)创建该文件的本地副本,无需下载首先是整个档案。
根据我(有限)的理解,这应该是可能的,尽管我不知道如何做到这一点。我一直在使用 TrueZip,因为它似乎支持多种存档类型,但我有对其以这种方式工作的能力表示怀疑。有人有此类事情的经验吗?
编辑:能够使用 tarball 和压缩 tarball 来做到这一点对我来说也很重要。
Given
- URL of an archive (e.g. a zip file)
- Full name (including path) of a file inside that archive
I'm looking for a way (preferably in Java) to create a local copy of that file, without downloading the entire archive first.
From my (limited) understanding it should be possible, though I have no idea how to do that. I've been using TrueZip, since it seems to support a large variety of archive types, but I have doubts about its ability to work in such a way. Does anyone have any experience with that sort of thing?
EDIT: being able to also do that with tarballs and zipped tarballs is also important for me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
那么,至少,您必须下载存档的部分,直到并包括您要提取的文件的压缩数据。这建议采用以下解决方案:打开存档的
URLConnection
,获取其输入流,将其包装在ZipInputStream
中,然后重复调用getNextEntry()
> 和closeEntry()
迭代文件中的所有条目,直到到达所需的条目。然后您可以使用 ZipInputStream.read(...) 读取其数据。Java 代码看起来像这样:
当然,这是未经测试的。
Well, at a minimum, you have to download the portion of the archive up to and including the compressed data of the file you want to extract. That suggests the following solution: open a
URLConnection
to the archive, get its input stream, wrap it in aZipInputStream
, and repeatedly callgetNextEntry()
andcloseEntry()
to iterate through all the entries in the file until you reach the one you want. Then you can read its data usingZipInputStream.read(...)
.The Java code would look something like this:
This is, of course, untested.
与这里的其他答案相反,我想指出 ZIP 条目是单独压缩的,因此(理论上)您不需要下载目录和条目本身之外的任何内容。服务器需要支持
Range
HTTP 标头才能正常工作。标准Java API仅支持从本地文件和输入流读取ZIP文件。据我所知,没有提供从随机访问远程文件读取的功能。
由于您使用的是 TrueZip,我建议使用 Apache HTTP 客户端实现
de.schlichtherle.io.rof.ReadOnlyFile
并使用以下命令创建de.schlichtherle.util.zip.ZipFile
那。这不会为压缩的 TAR 存档提供任何优势,因为整个存档被压缩在一起(不仅仅是使用 InputStream 并在获得条目时杀死它)。
Contrary to the other answers here, I'd like to point out that ZIP entries are compressed individually, so (in theory) you don't need to download anything more than the directory and the entry itself. The server would need to support the
Range
HTTP header for this to work.The standard Java API only supports reading ZIP files from local files and input streams. As far as I know there's no provision for reading from random access remote files.
Since you're using TrueZip, I recommend implementing
de.schlichtherle.io.rof.ReadOnlyFile
using Apache HTTP Client and creating ade.schlichtherle.util.zip.ZipFile
with that.This won't provide any advantage for compressed TAR archives since the entire archive is compressed together (beyond just using an InputStream and killing it when you have your entry).
从 TrueZIP 7.2 开始,TrueZIP 路径模块中有一个新的客户端 API。这是 JSE 7 的 NIO.2 FileSystemProvider 的实现。使用此 API,您可以访问 HTTP URI,如下所示:
Since TrueZIP 7.2, there is a new client API in the module TrueZIP Path. This is an implementation of an NIO.2 FileSystemProvider for JSE 7. Using this API, you can access HTTP URI as follows:
我不确定是否有办法从 ZIP 中提取单个文件,而无需先下载整个文件。但是,如果您是 ZIP 文件的托管者,则可以创建一个 Java servlet 来读取 ZIP 文件并在响应中返回所请求的文件:
I'm not sure if there's a way to pull out a single file from a ZIP without downloading the whole thing first. But, if you're the one hosting the ZIP file, you could create a Java servlet which reads the ZIP file and returns the requested file in the response: