在 Java 中通过 HTTP 下载目录

发布于 2024-08-10 13:49:48 字数 797 浏览 3 评论 0原文

我在目录树中有一些文件,这些文件通过 HTTP 提供服务。 给定一些子目录 A,在该目录树中我希望能够下载目录 A 以及所有包含子目录和文件。

Java 的某个黑暗角落里似乎存在一个简单/直接/原子的解决方案。有谁知道该怎么做?

网络爬虫无法解决我的问题,因为子目录中的文件可能链接到不是子目录的目录。

==更新==

目录和文件必须以静态方式托管。

服务器在目录树中静态托管文件,客户端运行 Java 并尝试使用 HTTP 复制目录树的某些分支。

VFS 是这个问题的答案,不幸的是我自己回答了这个问题,所以不能'两天后才选择它作为答案。如果有人愿意写下我的答案,我很乐意将他们的写下的内容标记为答案。

==进一步更新==

VFS 实际上不是答案。 VFS 不会通过 HTTP 列出目录,如此处所述。似乎确实有一些人对该功能感兴趣。

I have some files in a directory tree which is being served over HTTP.
Given some sub-directory A, in that directory tree I want to be able to download directory A and all containing subdirectories and files.

It seems likely that a simple/direct/atomic solution exists in the some dark corner of Java. Does anyone know how to do this?

A webcrawler will not solve my problem since files in sub-directories may link to directories that are not subdirectories.

==Update==

The directories and files must be hosted in static manner.

The server is statically hosting files in a directory tree, the client is running Java and attempting to copy some branch of the directory tree using HTTP.

VFS is the answer to this, unfortunately I answered the question myself and so can't choose it as the answer until two days from now. If someone would write up of my answer I would be happy to mark their write up as the answer.

==Further Update==

VFS is in fact not the answer. VFS will not list directories over HTTP, as stated here. There does seem to be a few people that are interested in that functionality.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

假面具 2024-08-17 13:49:48

我的第一个建议是创建一个 servlet/jsp,它递归地读取目录结构(使用 java.io.File),读取所有文件,将它们放在一个 zip 中(java.util.zip),然后将其发送到浏览器下载。

My first suggestion would be to create a servlet/jsp which recursiveley reads the directory structure (using java.io.File), reads all files, puts them in one zip (java.util.zip), and sends it to the browers for download.

撩人痒 2024-08-17 13:49:48

我不知道原子解决方案,但最简单的解决方案是使用 URLConnection 获取子目录(假设服务器列出该目录),然后解析响应,查找该目录的内容并再次使用 URLConnection获取其下的每个文件。

基于这些答案,现在我想知道你的意思是Java是在客户端还是服务器端!

I don't know of an atomic solution, but the most straightforward one would be using a URLConnection to fetch the sub-directory (assuming the server lists the directory) and then parse the response, look for contents of that directory and use URLConnection again to fetch each of the files under it.

Based on these answers, now I am wondering if you meant the Java to be on the client side or server side!

清眉祭 2024-08-17 13:49:48

因此,您希望从客户端检索服务器端特定 URL 的所有文件和目录的列表,就好像它是本地磁盘文件系统文件夹一样?当服务器没有启用目录索引时,这通常是不可能的。即便如此,您仍然需要解析代表目录索引的 HTML 页面,并自己解析代表文件和文件夹的所有 元素。没有正常的 java.io.File 方法来实现此目的。这将是一个巨大安全漏洞。例如,可以从 http://gmail.com 下载所有源文件。 HTTP 并不意味着文件传输协议。使用FTP。这就是它所代表的地方。

So you want from the client side on retrieve a list of all files and directores for the particular URL of the server side as if it is a local disk file system folder? That's usually not possible when the server doesn't have directory indexing enabled. And even then, you still need to parse the HTML page which represents the directory index and parse all <a> elements representing the files and folders yourself. There's no normal java.io.File approach for this. That would have been a huge security hole. One would for example be able to download all source files from http://gmail.com. HTTP is not meant as a file transfer protocol. Use FTP. That's where it stands for.

雨落星ぅ辰 2024-08-17 13:49:48

假设您可以控制服务器和客户端,我会编写一个页面(使用您选择的最喜欢的技术;ASP、JSP、PHP 等)来读取服务器目录结构,并动态返回一个由一堆组成的页面每个要下载的文件的链接。

然后客户端您可以触发每个链接的下载。

什么是客户端技术?是下载某种应用程序还是网络浏览器?它必须有客户端接口吗?


如果这是某种内部实用程序,也许您可​​以直接使用 FTP 来代替?在服务器上打开 FTP 访问并下载目录会很容易...


添加另一个可能的答案:

如果服务器没有打开目录列表,那么您基本上必须在服务器端进行修改。最简单的事情就是制作一个页面,以已知的格式将目录结构返回给客户端(请参阅上面我的第一个答案)。

如果您控制服务器并打开目录列表,并且您总是使用相同的服务器程序(IIS、Tomcat、JBoss 等),那么您也许可以让客户端网络抓取目录列表。例如,在 IIS 的目录列表中,您可以分辨哪些链接是目录,哪些是文件,因为它总是在目录链接末尾添加“/”,并显示“dir”而不是文件大小:

 Friday, October 16, 2009 03:55 PM        <dir> <A href="Unity/">Unity</A>
 Thursday, July 02, 2009 10:42 AM           95 <A href="Global.asax">Global.asax</A>

您可以这里告诉您第一个链接是一个目录,第二个链接是一个实际文件。

因此,如果您使用的是一致的服务器应用程序,只需看看如何返回目录列表即可。也许你会幸运。

Assuming you have control over both the server and client, I would write a page (in your favorite technology of your choice; ASP, JSP, PHP, etc) that reads the server directory structure, and dynamically returns a page that consists of a bunch of links to each file to be downloaded.

Then client side you can trigger a download of each link.

What is the client side technology? is the thing doing the downloading an application of some sort, or a web browser? Does it have to have a client interface?


If this is some sort of in-house utility program, maybe you can just FTP instead? Having FTP access open on a server and downloading a directory would be easy...


Adding another possible answer:

If the server does not have directory listings turned on, then you basically have to make a modification server side. The easiest thing would be to just make a page that returns the dir structure to the client in a known format (see my 1st answer above).

If you control the server and have directory listings on, and you are always using the same server program (IIS, Tomcat, JBoss, etc) then you might be able to just make the client webcrawl the directory listings. For example, in a directory listing from IIS, you can tell which links are directories and which are files because it always puts a '/' at the end of a directory link, and shows 'dir' instead of a file size:

 Friday, October 16, 2009 03:55 PM        <dir> <A href="Unity/">Unity</A>
 Thursday, July 02, 2009 10:42 AM           95 <A href="Global.asax">Global.asax</A>

You can tell here that the 1st link is a directory, and the 2nd is an actual file.

So if you are using a consistent server app, just take a look at how the directory listing is returned. Maybe you'll get lucky.

时光沙漏 2024-08-17 13:49:48

如果我没有弄错的话,HTTP 不会告诉你任何关于服务器端“结构”的信息——如果这样的东西存在的话。

考虑一下 REST,其中 URI 并不真正告诉您在服务器上哪里可以找到文件,而只能触发某些操作、检索数据等。

所以我不认为你想要实现的目标可以可靠地完成,无论是使用 Java 还是任何其他语言。或者也许我在这里理解错了?

If I am not terribly mistaken, HTTP does not tell you anything about the "structure" of the server side - if such a thing even exists.

Think about REST where the URI does not really tell you where to find a file on the server, but could merely trigger some action, retrieve data or the like.

So I do not think what you are trying to achieve can be done reliably, be it with Java or any other language. Or maybe I am getting you wrong here?

听闻余生 2024-08-17 13:49:48

谈论容易实现的目标;-) 谢谢你的提议,e5!

Commons VFS 提供了一个用于访问各种不同文件系统的 API。它提供了来自各种不同来源的文件的统一视图,例如本地磁盘上、HTTP 服务器上或 Zip 存档内的文件。

http://commons.apache.org/vfs/

Talk about low-hanging fruit ;-) Thanks for the offer, e5!

Commons VFS provides a single API for accessing various different file systems. It presents a uniform view of the files from various different sources, such as the files on local disk, on an HTTP server, or inside a Zip archive.

http://commons.apache.org/vfs/

獨角戲 2024-08-17 13:49:48

谷歌一段时间以来第一次击败了 stackoverflow,Apache commons VFS 正是我所需要的。

Commons VFS 提供了一个 API
访问各种不同的文件
系统。它呈现了一个统一的视图
来自各种不同的文件
来源,例如本地文件
磁盘、HTTP 服务器上或内部
压缩存档。

http://commons.apache.org/vfs/

==更新==

如中所述问题 VFS 只是假装解决了这个问题,因为它不允许列出 http 目录。

For the first time in a while google beat stackoverflow, Apache commons VFS does exactly what I need.

Commons VFS provides a single API for
accessing various different file
systems. It presents a uniform view of
the files from various different
sources, such as the files on local
disk, on an HTTP server, or inside a
Zip archive.

http://commons.apache.org/vfs/

==Update==

As stated in the question VFS only pretends to solve this problem, since it doesn't allow the listing of http directories.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文