获取 Web 服务器上的文件列表
所有,
我想从服务器上获取文件列表,其中包含完整的网址。例如,我想从这里获取所有 TIFF。
http://hyperquad.telascience.org/naipsource/Texas/20100801/*
我可以使用 wget 下载所有 .tif 文件,但我正在寻找的只是每个文件的完整 url,如下所示。
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_2_20100430.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_3_20100424.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_4_20100430.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_1_20100430.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_2_20100430.tif
关于如何使用curl或wget等将所有这些文件放入列表中的任何想法?
亚当
All,
I would like to get a list of files off of a server with the full url in tact. For example, I would like to get all the TIFFs from here.
http://hyperquad.telascience.org/naipsource/Texas/20100801/*
I can download all the .tif files with wget but I am looking for is just the full url to each file like this.
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_2_20100430.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_3_20100424.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_4_20100430.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_1_20100430.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_2_20100430.tif
Any thoughts on how to get all these files in to a list using something like curl or wget?
Adam
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您需要服务器愿意为您提供一个包含列表的页面。这通常是一个index.html 或只是询问目录。
看来你在这种情况下很幸运,因此,冒着让网站管理员不安的风险,解决方案是使用 wget 的递归选项。将最大递归指定为 1 以将其限制为该单个目录。
You'd need the server to be willing to give you a page with a listing on it. This would normally be an index.html or just ask for the directory.
It looks like you're in luck in this case so, at risk of upsetting the web master, the solution would be to use wget's recursive option. Specify a maximum recursion of 1 to keep it constrained to that single directory.
我将使用 lynx shell Web 浏览器获取链接列表 + grep 和 awk shell 工具来过滤结果,如下所示:
..其中:
http://hyperquad.telascience.org/naipsource/Texas/20100801/
\.tif$
完整示例命令行以获取此 SO 页面上 TIF 文件的链接:
..现在返回:
I would use
lynx
shell web browser to get the list of links +grep
andawk
shell tools to filter the results, like this:..where:
http://hyperquad.telascience.org/naipsource/Texas/20100801/
\.tif$
Complete example commandline to get links to TIF files on this SO page:
..now returns:
如果您
wget http://hyperquad.telascience.org/naipsource/Texas/20100801/
,则返回的 HTML 包含文件列表。如果您不需要这是通用的,您可以使用正则表达式来提取链接。如果您需要更强大的东西,您可以使用 HTML 解析器(例如 BeautifulSoup ),并以编程方式提取页面上的链接(从实际的 HTML 结构)。If you
wget http://hyperquad.telascience.org/naipsource/Texas/20100801/
, the HTML that is returned contains the list of files. If you don't need this to be general, you could use regexes to extract the links. If you need something more robust, you can use an HTML parser (e.g. BeautifulSoup), and programmatically extract the links on the page (from the actual HTML structure).使用winscp有一个查找窗口,可以从自己的网站中的目录中搜索目录和子目录中的所有文件 - 之后可以选择全部并复制,并在文本中包含所有文件的所有链接 - 需要用户名和密码对于连接 ftp:
https://winscp.net/eng/download.php
With winscp have a find window that is possible search for all files in directories and subdirectories from a directory in the own web - after is possible select all and copy, and have in text all links to all files -, need have the username and password for connect ftp:
https://winscp.net/eng/download.php
我有一个客户端服务器系统,它从应用程序服务器文件夹中指定的文件夹中检索文件名,然后在客户端中显示缩略图。
CLIENT:(slThumbnailNames 是一个字符串列表)
==在服务器端===
TIDCmdTCPServer 有一个 CommandHandler GetThumbnailNames(命令处理程序是一个过程)
提示:sMFFBServerPictures 在应用服务器的 oncreate 方法中生成。
sThumbnailDir 从客户端传递到应用程序服务器。
I have a client-server system that retrieves the file names from an assigned folder in the app server's folder, then displays thumbnails in the client.
CLIENT: (slThumbnailNames is a string list)
== on the server side ===
A TIDCmdTCPServer has a CommandHandler GetThumbnailNames (a commandhandler is a procedure)
Hints: sMFFBServerPictures is generated in the oncreate method of the app server.
sThumbnailDir is passed to the app server from the client.