检查大型 URL 列表的文件类型的最快方法是什么(以及如何优化我的代码)
我有一个很大的 URL 列表(随着时间的推移而扩展),我需要检查它们的类型。这是我目前拥有的代码:
private string[] MIME = new string[] {
"audio/ogg - ogg",
"video/ogg - ogg",
"application/f4v - mp4",
"application/octet-stream - mp3",
"audio/aac - mp3",
"audio/mp3 - mp3",
"audio/mp4 - mp4",
"audio/mp4-latm - m4a",
"audio/mpeg - mp3",
"audio/mpeg3 - mp3",
"audio/x-mpeg - mp3",
"audio/x-ms-wma - wma",
"video/f4v - mp4",
"video/mp4 - mp4",
};
private string CheckType(string url) {
try {
HttpWebRequest webRequest = (HttpWebRequest) WebRequest.Create(new Uri(url));
webRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; rv:2.0) Gecko/20100101 Firefox/4.0";
webRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
webRequest.Timeout = 5000;
HttpWebResponse webResponse = (HttpWebResponse) webRequest.GetResponse();
long fileSize = webResponse.ContentLength;
foreach (string mime_entry in MIME) {
string sheader = webResponse.Headers.ToString();
string[] mime = mime_entry.Split(new string[] { " - " }, StringSplitOptions.RemoveEmptyEntries);
if (sheader.Contains(mime[0])) {
return mime[1] + " " + fileSize.ToString();
}
}
return "";
} catch (Exception ex) {
return "";
}
}
- 我可以更快地提出请求吗?
- 我能否以某种方式使用多线程来更快地迭代列表(如果其中一个线程由于 http 响应而停止怎么办?)
- 是否有更好的方法来做到这一点?
I have a large list (expanding over time) of URLs that I need to check their type. That's the code I currently have:
private string[] MIME = new string[] {
"audio/ogg - ogg",
"video/ogg - ogg",
"application/f4v - mp4",
"application/octet-stream - mp3",
"audio/aac - mp3",
"audio/mp3 - mp3",
"audio/mp4 - mp4",
"audio/mp4-latm - m4a",
"audio/mpeg - mp3",
"audio/mpeg3 - mp3",
"audio/x-mpeg - mp3",
"audio/x-ms-wma - wma",
"video/f4v - mp4",
"video/mp4 - mp4",
};
private string CheckType(string url) {
try {
HttpWebRequest webRequest = (HttpWebRequest) WebRequest.Create(new Uri(url));
webRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; rv:2.0) Gecko/20100101 Firefox/4.0";
webRequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
webRequest.Timeout = 5000;
HttpWebResponse webResponse = (HttpWebResponse) webRequest.GetResponse();
long fileSize = webResponse.ContentLength;
foreach (string mime_entry in MIME) {
string sheader = webResponse.Headers.ToString();
string[] mime = mime_entry.Split(new string[] { " - " }, StringSplitOptions.RemoveEmptyEntries);
if (sheader.Contains(mime[0])) {
return mime[1] + " " + fileSize.ToString();
}
}
return "";
} catch (Exception ex) {
return "";
}
}
- Can I make my request faster?
- Can I somehow use multi-threading to iterate the list faster (what if one of the threads halts because of the http response?)
- Is there a better way to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
是的,您可以通过仅发出 HEAD 请求来加快速度,因为毕竟您不会将响应正文用于任何用途。
是的,适度的多线程是很有意义的 - 如果 URL 位于不同的服务器上,则服务器等待时间可以轻松并行化。使用同步队列和一些处理该队列的工作线程将是并行化此操作的简单方法。您可以尝试使用线程数,我会尝试 8 个线程作为起点。
见上文。而且,您的 MIME 检查代码也不是最佳的。您可以使用
Dictionary
进行查找;在Headers
集合中,您应该只查看 Content-Type,而不是整个 headers 集合。Yes, you can make this faster by only issuing HEAD requests, since, after all, you don't use the response body for anything.
Yes, it would make good sense to multi-thread this moderately - if the urls are on different servers, there will be server wait times that can be parallellized easily. Use a synchronized queue and some worker threads processing the queue would be an easy way to parallellize this. You can experiment with the number of threads, I'd try 8 threads as a starting point.
See above. And also, your MIME-checking code is supoptimal. You can use a
Dictionary<string,string>
for the lookup; and in theHeaders
collection, you should only look at Content-Type, not the whole headers collection.