获取 Web 服务器上的文件列表

发布于 2024-11-28 07:42:15 字数 1173 浏览 3 评论 0原文

所有,

我想从服务器上获取文件列表,其中包含完整的网址。例如,我想从这里获取所有 TIFF。

http://hyperquad.telascience.org/naipsource/Texas/20100801/*

我可以使用 wget 下载所有 .tif 文件,但我正在寻找的只是每个文件的完整 url,如下所示。

http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_2_20100430.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_3_20100424.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_4_20100430.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_1_20100430.tif http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_2_20100430.tif

关于如何使用curl或wget等将所有这些文件放入列表中的任何想法?

亚当

All,

I would like to get a list of files off of a server with the full url in tact. For example, I would like to get all the TIFFs from here.

http://hyperquad.telascience.org/naipsource/Texas/20100801/*

I can download all the .tif files with wget but I am looking for is just the full url to each file like this.

http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_2_20100430.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_3_20100424.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_4_20100430.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_1_20100430.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_2_20100430.tif

Any thoughts on how to get all these files in to a list using something like curl or wget?

Adam

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

避讳 2024-12-05 07:42:15

您需要服务器愿意为您提供一个包含列表的页面。这通常是一个index.html 或只是询问目录。

http://hyperquad.telascience.org/naipsource/Texas/20100801/

看来你在这种情况下很幸运,因此,冒着让网站管理员不安的风险,解决方案是使用 wget 的递归选项。将最大递归指定为 1 以将其限制为该单个目录。

You'd need the server to be willing to give you a page with a listing on it. This would normally be an index.html or just ask for the directory.

http://hyperquad.telascience.org/naipsource/Texas/20100801/

It looks like you're in luck in this case so, at risk of upsetting the web master, the solution would be to use wget's recursive option. Specify a maximum recursion of 1 to keep it constrained to that single directory.

嘿看小鸭子会跑 2024-12-05 07:42:15

我将使用 lynx shell Web 浏览器获取链接列表 + grep 和 awk shell 工具来过滤结果,如下所示:

lynx -dump -listonly <URL> | grep http | grep <regexp> | awk '{print $2}'

..其中:

  • URL - 是起始 URL,在您的情况下:http://hyperquad.telascience.org/naipsource/Texas/20100801/
  • regexp - 是仅选择感兴趣的文件的正则表达式您,在您的情况下: \.tif$

完整示例命令行以获取此 SO 页面上 TIF 文件的链接:

lynx -dump -listonly http://stackoverflow.com/questions/6989681/getting-a-list-of-files-on-a-web-server | grep http | grep \.tif$ | awk '{print $2}'

..现在返回:

http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_2_20100430.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_4_20100430.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_2_20100430.tif

I would use lynx shell web browser to get the list of links + grep and awk shell tools to filter the results, like this:

lynx -dump -listonly <URL> | grep http | grep <regexp> | awk '{print $2}'

..where:

  • URL - is the start URL, in your case: http://hyperquad.telascience.org/naipsource/Texas/20100801/
  • regexp - is the regular expression that selects only files that interest you, in your case: \.tif$

Complete example commandline to get links to TIF files on this SO page:

lynx -dump -listonly http://stackoverflow.com/questions/6989681/getting-a-list-of-files-on-a-web-server | grep http | grep \.tif$ | awk '{print $2}'

..now returns:

http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_2_20100430.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_04_4_20100430.tif
http://hyperquad.telascience.org/naipsource/Texas/20100801/naip10_1m_2597_05_2_20100430.tif
﹏雨一样淡蓝的深情 2024-12-05 07:42:15

如果您wget http://hyperquad.telascience.org/naipsource/Texas/20100801/,则返回的 HTML 包含文件列表。如果您不需要这是通用的,您可以使用正则表达式来提取链接。如果您需要更强大的东西,您可以使用 HTML 解析器(例如 BeautifulSoup ),并以编程方式提取页面上的链接(从实际的 HTML 结构)。

If you wget http://hyperquad.telascience.org/naipsource/Texas/20100801/, the HTML that is returned contains the list of files. If you don't need this to be general, you could use regexes to extract the links. If you need something more robust, you can use an HTML parser (e.g. BeautifulSoup), and programmatically extract the links on the page (from the actual HTML structure).

一紙繁鸢 2024-12-05 07:42:15

使用winscp有一个查找窗口,可以从自己的网站中的目录中搜索目录和子目录中的所有文件 - 之后可以选择全部并复制,并在文本中包含所有文件的所有链接 - 需要用户名和密码对于连接 ftp:

https://winscp.net/eng/download.php

With winscp have a find window that is possible search for all files in directories and subdirectories from a directory in the own web - after is possible select all and copy, and have in text all links to all files -, need have the username and password for connect ftp:

https://winscp.net/eng/download.php

请远离我 2024-12-05 07:42:15

我有一个客户端服务器系统,它从应用程序服务器文件夹中指定的文件夹中检索文件名,然后在客户端中显示缩略图。
CLIENT:(slThumbnailNames 是一个字符串列表)
==在服务器端===
TIDCmdTCPServer 有一个 CommandHandler GetThumbnailNames(命令处理程序是一个过程)

提示:sMFFBServerPictures 在应用服务器的 oncreate 方法中生成。
sThumbnailDir 从客户端传递到应用程序服务器。

`slThumbnailNames := funGetThumbnailNames(sThumbNailPath);
function TfMFFBClient.funGetThumbnailNames(sThumbnailPath:string):TStringList;
var
  slThisStringList:TStringList;
begin
  slThisStringList := TStringList.Create;
  dmMFFBClient.tcpMFFBClient.SendCmd('GetThumbnailNames,' + sThumbnailPath,700);
  dmMFFBClient.tcpMFFBClient.IOHandler.Capture(slThisStringList);
  result := slThisStringList;
end;

procedure TfMFFBServer.MFFBCmdTCPServercmdGetThumbnailNames(
  ASender: TIdCommand);
var
  sRec:TSearchRec;
  sThumbnailDir:string;
  i,iNumFiles: Integer;
begin
  try
    ASender.Response.Clear;
    sThumbnailDir := ASender.Params[0];
    iNumFiles := FindFirst(sMFFBServerPictures + sThumbnailDir + '*_t.jpg', faAnyfile, SRec );
    if iNumFiles = 0 then
    try
      ASender.Response.Add(SRec.Name);

      while iNumFiles = 0 do
      begin
        if (SRec.Attr and faDirectory <> faDirectory) then
          ASender.Response.Add(SRec.Name);
        iNumFiles := FindNext(SRec);
      end;
    finally
      FindClose(SRec)
    end
    else
      ASender.Response.Add('NO THUMBNAILS');
  except
  on e:exception do
  begin
    messagedlg('Error in procedure TfMFFBServer.MFFBCmdTCPServercmdGetThumbnailNames'+#13+
      'Error msg: ' + e.Message,mterror,[mbok],0);
   end;
  end;
end;`

I have a client-server system that retrieves the file names from an assigned folder in the app server's folder, then displays thumbnails in the client.
CLIENT: (slThumbnailNames is a string list)
== on the server side ===
A TIDCmdTCPServer has a CommandHandler GetThumbnailNames (a commandhandler is a procedure)

Hints: sMFFBServerPictures is generated in the oncreate method of the app server.
sThumbnailDir is passed to the app server from the client.

`slThumbnailNames := funGetThumbnailNames(sThumbNailPath);
function TfMFFBClient.funGetThumbnailNames(sThumbnailPath:string):TStringList;
var
  slThisStringList:TStringList;
begin
  slThisStringList := TStringList.Create;
  dmMFFBClient.tcpMFFBClient.SendCmd('GetThumbnailNames,' + sThumbnailPath,700);
  dmMFFBClient.tcpMFFBClient.IOHandler.Capture(slThisStringList);
  result := slThisStringList;
end;

procedure TfMFFBServer.MFFBCmdTCPServercmdGetThumbnailNames(
  ASender: TIdCommand);
var
  sRec:TSearchRec;
  sThumbnailDir:string;
  i,iNumFiles: Integer;
begin
  try
    ASender.Response.Clear;
    sThumbnailDir := ASender.Params[0];
    iNumFiles := FindFirst(sMFFBServerPictures + sThumbnailDir + '*_t.jpg', faAnyfile, SRec );
    if iNumFiles = 0 then
    try
      ASender.Response.Add(SRec.Name);

      while iNumFiles = 0 do
      begin
        if (SRec.Attr and faDirectory <> faDirectory) then
          ASender.Response.Add(SRec.Name);
        iNumFiles := FindNext(SRec);
      end;
    finally
      FindClose(SRec)
    end
    else
      ASender.Response.Add('NO THUMBNAILS');
  except
  on e:exception do
  begin
    messagedlg('Error in procedure TfMFFBServer.MFFBCmdTCPServercmdGetThumbnailNames'+#13+
      'Error msg: ' + e.Message,mterror,[mbok],0);
   end;
  end;
end;`
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文