通过 http 获取目录列表

发布于 2024-10-08 06:04:14 字数 216 浏览 0 评论 0原文

有一个通过网络提供的目录,我有兴趣监视它。它的内容是我正在使用的软件的各种版本,我想编写一个可以运行的脚本来检查其中的内容,并下载比我已经拥有的更新的任何内容。

有没有办法,比如使用 wget 或其他东西来获取目录列表。我尝试在目录上使用 wget ,这给了我 html。为了避免解析 html 文档,是否有一种方法可以检索像 ls 这样的简单列表?

There is a directory that is being served over the net which I'm interested in monitoring. Its contents are various versions of software that I'm using and I'd like to write a script that I could run which checks what's there, and downloads anything that is newer that what I've already got.

Is there a way, say with wget or something, to get a a directory listing. I've tried using wget on the directory, which gives me html. To avoid having to parse the html document, is there a way of retrieving a simple listing like ls would give?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

空袭的梦i 2024-10-15 06:04:14

我刚刚找到了一种方法:

$ wget --spider -r --no-parent http://some.served.dir.ca/

它非常冗长,因此您需要根据您想要的内容通过 grep 进行几次管道传输,但信息都在那里。它看起来像打印到 stderr,因此附加 2>&1grep 对其进行处理。我搜索“\.tar\.gz”以找到该网站提供的所有 tarball。

请注意,wget 在工作目录中写入临时文件,并且不会清理其临时目录。如果出现问题,您可以更改为临时目录:

$ (cd /tmp && wget --spider -r --no-parent http://some.served.dir.ca/)

I just figured out a way to do it:

$ wget --spider -r --no-parent http://some.served.dir.ca/

It's quite verbose, so you need to pipe through grep a couple of times depending on what you're after, but the information is all there. It looks like it prints to stderr, so append 2>&1 to let grep at it. I grepped for "\.tar\.gz" to find all of the tarballs the site had to offer.

Note that wget writes temporary files in the working directory, and doesn't clean up its temporary directories. If this is a problem, you can change to a temporary directory:

$ (cd /tmp && wget --spider -r --no-parent http://some.served.dir.ca/)
墨落画卷 2024-10-15 06:04:14

您所要求的最佳服务是使用 FTP,而不是 HTTP。

HTTP 没有目录列表的概念,而 FTP 有。

大多数 HTTP 服务器不允许访问目录列表,而那些允许访问的服务器只是将其作为服务器的一项功能,而不是 HTTP 协议。对于这些 HTTP 服务器,它们决定生成并发送 HTML 页面供人类使用,而不是机器使用。您对此无法控制,并且别无选择,只能解析 HTML。

FTP 是为机器消耗而设计的,更重要的是引入了 MLSTMLSD 命令来取代不明确的 LIST 命令。

What you are asking for best served using FTP, not HTTP.

HTTP has no concept of directory listings, FTP does.

Most HTTP servers do not allow access to directory listings, and those that do are doing so as a feature of the server, not the HTTP protocol. For those HTTP servers, they are deciding to generate and send an HTML page for human consumption, not machine consumption. You have no control over that, and would have no choice but to parse the HTML.

FTP is designed for machine consumption, more so with the introduction of the MLST and MLSD commands that replace the ambiguous LIST command.

依 靠 2024-10-15 06:04:14

以下不是递归的,但它对我有用:

$ curl -s https://www.kernel.org/pub/software/scm/git/

输出是 HTML 并写入 stdout。与 wget 不同,没有任何内容写入磁盘。

-s (--silent) 在管道输出时相关,尤其是在不能有噪音的脚本中。

只要有可能,请记住不要使用 ftphttp 代替 https

The following is not recursive, but it worked for me:

$ curl -s https://www.kernel.org/pub/software/scm/git/

The output is HTML and is written to stdout. Unlike with wget, there is nothing written to disk.

-s (--silent) is relevant when piping the output, especially within a script that must not be noisy.

Whenever possible, remember not to use ftp or http instead of https.

神仙妹妹 2024-10-15 06:04:14

如果它由 http 提供服务,则无法获得简单的目录列表。您浏览时看到的列表(即 wget 正在检索的列表)是由 Web 服务器作为 HTML 页面生成的。您所能做的就是解析该页面并提取信息。

If it's being served by http then there's no way to get a simple directory listing. The listing you see when you browse there, which is the one wget is retrieving, is generated by the web server as an HTML page. All you can do is parse that page and extract the information.

寻找一个思念的角度 2024-10-15 06:04:14

AFAIK,出于安全目的,没有办法获得这样的目录列表。幸运的是,您的目标目录具有 HTML 列表,因为它允许您解析它并发现新的下载。

AFAIK, there is no way to get a directory listing like that for security purposes. It is rather lucky that your target directory has the HTML listing because it does allow you to parse it and discover new downloads.

帅气尐潴 2024-10-15 06:04:14

您可以使用IDM(互联网下载管理器)
它有一个名为“IDM SITE GRABBER”的实用程序,输入 http/https URL,它将为您从 http/https 协议下载所有文件和文件夹。

You can use IDM (internet download manager)
It has a utility named "IDM SITE GRABBER" input the http/https URLs and it will download all files and folders from http/https protocol for you.

小ぇ时光︴ 2024-10-15 06:04:14

elinks 在这方面做得还算不错。只需 elinks 即可通过终端与目录树交互。

您还可以将内容转储到终端。在这种情况下,您可能需要诸如 --no-references--no-numbering 之类的标志。

elinks does a halfway decent job of this. Just elinks <URL> to interact with a directory tree through the terminal.

You can also dump the content to the terminal. In that case, you may want flags like --no-references and --no-numbering.

神经暖 2024-10-15 06:04:14

使用lftp:

LS_COLORS=no lftp -e 'cls -1; exit' 'https://cdn.kernel.org/pub/linux/kernel/v1.0/' 2>/dev/null

Use lftp:

LS_COLORS=no lftp -e 'cls -1; exit' 'https://cdn.kernel.org/pub/linux/kernel/v1.0/' 2>/dev/null
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文