检查大型 URL 列表的文件类型的最快方法是什么（以及如何优化我的代码）

发布于 2024-12-03 23:12:54 字数 1667 浏览 1 评论 0原文

我有一个很大的 URL 列表（随着时间的推移而扩展），我需要检查它们的类型。这是我目前拥有的代码：

    private string[] MIME = new string[] {
        "audio/ogg - ogg",
        "video/ogg - ogg",
        "application/f4v - mp4",
        "application/octet-stream - mp3",
        "audio/aac - mp3",
        "audio/mp3 - mp3",
        "audio/mp4 - mp4",
        "audio/mp4-latm - m4a",
        "audio/mpeg - mp3",
        "audio/mpeg3 - mp3",
        "audio/x-mpeg - mp3",
        "audio/x-ms-wma - wma",
        "video/f4v - mp4",
        "video/mp4 - mp4",
    };


    private string CheckType(string url) {
        try {
            HttpWebRequest webRequest = (HttpWebRequest) WebRequest.Create(new Uri(url));

            webRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; rv:2.0) Gecko/20100101 Firefox/4.0";
            webRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
            webRequest.Timeout = 5000;

            HttpWebResponse webResponse = (HttpWebResponse) webRequest.GetResponse();
            long fileSize = webResponse.ContentLength;

            foreach (string mime_entry in MIME) {
                string sheader = webResponse.Headers.ToString();
                string[] mime = mime_entry.Split(new string[] { " - " }, StringSplitOptions.RemoveEmptyEntries);

                if (sheader.Contains(mime[0])) {
                    return mime[1] + " " + fileSize.ToString();
                }
            }

            return "";
        } catch (Exception ex) {
            return "";
        }
    }

我可以更快地提出请求吗？
我能否以某种方式使用多线程来更快地迭代列表（如果其中一个线程由于 http 响应而停止怎么办？）
是否有更好的方法来做到这一点？

原文

I have a large list (expanding over time) of URLs that I need to check their type. That's the code I currently have:

    private string[] MIME = new string[] {
        "audio/ogg - ogg",
        "video/ogg - ogg",
        "application/f4v - mp4",
        "application/octet-stream - mp3",
        "audio/aac - mp3",
        "audio/mp3 - mp3",
        "audio/mp4 - mp4",
        "audio/mp4-latm - m4a",
        "audio/mpeg - mp3",
        "audio/mpeg3 - mp3",
        "audio/x-mpeg - mp3",
        "audio/x-ms-wma - wma",
        "video/f4v - mp4",
        "video/mp4 - mp4",
    };


    private string CheckType(string url) {
        try {
            HttpWebRequest webRequest = (HttpWebRequest) WebRequest.Create(new Uri(url));

            webRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; rv:2.0) Gecko/20100101 Firefox/4.0";
            webRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
            webRequest.Timeout = 5000;

            HttpWebResponse webResponse = (HttpWebResponse) webRequest.GetResponse();
            long fileSize = webResponse.ContentLength;

            foreach (string mime_entry in MIME) {
                string sheader = webResponse.Headers.ToString();
                string[] mime = mime_entry.Split(new string[] { " - " }, StringSplitOptions.RemoveEmptyEntries);

                if (sheader.Contains(mime[0])) {
                    return mime[1] + " " + fileSize.ToString();
                }
            }

            return "";
        } catch (Exception ex) {
            return "";
        }
    }

Can I make my request faster?
Can I somehow use multi-threading to iterate the list faster (what if one of the threads halts because of the http response?)
Is there a better way to do this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

倾城泪 2024-12-10 23:12:54

是的，您可以通过仅发出 HEAD 请求来加快速度，因为毕竟您不会将响应正文用于任何用途。
是的，适度的多线程是很有意义的 - 如果 URL 位于不同的服务器上，则服务器等待时间可以轻松并行化。使用同步队列和一些处理该队列的工作线程将是并行化此操作的简单方法。您可以尝试使用线程数，我会尝试 8 个线程作为起点。
见上文。而且，您的 MIME 检查代码也不是最佳的。您可以使用 Dictionary 进行查找；在 Headers 集合中，您应该只查看 Content-Type，而不是整个 headers 集合。