从网络服务器递归列出文件

发布于 2024-12-09 16:23:07 字数 836 浏览 2 评论 0原文

我目前尝试实现一个简单的网络下载器，它在唯一的目录中递归下载文件。

我要列出服务器上的文件： Updater.cs：

    public static List<string> remote_filecheck()
    {
        List<string> rfiles = new List<string>();
        string url = "http://********/patchlist.txt";
        WebClient client = new WebClient();
        client.DownloadFile(url, @"patchlist.txt");

        string line;
        StreamReader reader = new StreamReader("patchlist.txt");

        while ((line = reader.ReadLine()) != null)
        {
            rfiles.Add(line);
        }
        reader.Close();
        return rfiles;
    }

我目前使用补丁列表，其中包含指向我的 http 文件的所有直接链接。

我尝试了网络上几乎所有有关递归下载的片段，例如 RegEx、WebRequests 等。

现在我想知道你是否有一个好方法来递归通过我的 HTTP 服务器并列出所有文件名，这就是我想知道的。

当我有文件名的 List 时，我就可以完成剩下的事情了。

原文

I currently try to implement a simple webdownloader, which downloads files recursive throughout the only directory.

What i got to list the files on the server:
Updater.cs:

    public static List<string> remote_filecheck()
    {
        List<string> rfiles = new List<string>();
        string url = "http://********/patchlist.txt";
        WebClient client = new WebClient();
        client.DownloadFile(url, @"patchlist.txt");

        string line;
        StreamReader reader = new StreamReader("patchlist.txt");

        while ((line = reader.ReadLine()) != null)
        {
            rfiles.Add(line);
        }
        reader.Close();
        return rfiles;
    }

I currently work with a patchlist, which consists of all direct links to my http files.

I tried nearly every single snippet on the web concerning recursive download e.g. RegEx, WebRequests and stuff.

Now i want to know if you got a good way to go recursive through my HTTP Server and list all the filenames, which is all i want to know.

When i have a List<string> of filenames, then i am able to do the rest.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

呆头 2024-12-16 16:23:07

您尝试从中获取文件并建立索引的服务器是否已打开？

如果是这样，那么可能需要抓取返回的页面，然后逐一访问每个网址。

如果没有，那么我不确定这是否可以很容易地完成。

好的，根据下面的评论，我认为您会想要执行以下操作：

        string indexUrl = "http://www.stackoverflow.com";

        WebBrowser browser = new WebBrowser();
        browser.Navigate(indexUrl);

        do
        {
            Application.DoEvents();
        } while (browser.ReadyState != WebBrowserReadyState.Complete);



        var listOfFilePaths = new List<string>();


        foreach (HtmlElement linkElement in browser.Document.GetElementsByTagName("a"))
        {
            var pagePath = linkElement.GetAttribute("href");
            listOfFilePaths.Add(pagePath);
        }

请注意，WebBrowser 控件需要在 Windows 窗体应用程序中运行才能使其工作（轻松）。我使用的indexPath变量应该更改为服务器索引页的路径（我只是以stackoverflow为例）。

foreach 循环从站点中提取所有锚点 (a) 标记，获取它们指向的路径，并将它们添加到 listOfFilePaths 集合中。

一旦此代码执行完毕，listOfFilePaths 集合将包含索引页上每个链接的条目，从而包含指向服务器上每个文件的链接。

从这里开始，需要循环 listOfFilePaths 集合并逐个下载每个文件。也许甚至使用一些规则不下载您不感兴趣的某些类型的文件。我相信根据您所说的，您应该能够做到这一点。

希望这有帮助。

Has the server that you are trying to get the files from got indexing switched on?

If so then it's probably a matter of scraping this page that comes back and then visiting each url one by one.

If not then I'm not sure it can be done very easily.

Ok based on comments below I think you'll want to do something like this:

        string indexUrl = "http://www.stackoverflow.com";

        WebBrowser browser = new WebBrowser();
        browser.Navigate(indexUrl);

        do
        {
            Application.DoEvents();
        } while (browser.ReadyState != WebBrowserReadyState.Complete);



        var listOfFilePaths = new List<string>();


        foreach (HtmlElement linkElement in browser.Document.GetElementsByTagName("a"))
        {
            var pagePath = linkElement.GetAttribute("href");
            listOfFilePaths.Add(pagePath);
        }

Note that the WebBrowser control needs to be run in a Windows forms app to get it work (easily). The indexPath variable I used should be changed to the path of the index page of the server (I just used stackoverflow as an example).

The foreach loop extracts all anchor (a) tags out of the site and gets the path they are pointing to and adds them to the listOfFilePaths collection.

Once this code has finished executing the listOfFilePaths collection will contain an entry for every link on the index page and hence a link to every file on the server.

From here it's a matter of looping round the listOfFilePaths collection and downloading each file one by one. Perhaps even using some rules not to download certain types of files that you're not interested in. I believe from what you've said you should be able to do this.

Hope this helps.

回复收藏 0 原文

~没有更多了~