如何从c#获取网站标题
我正在重新审视我的一些旧代码,并偶然发现了一种根据网址获取网站标题的方法。 这实际上并不是所谓的稳定方法,因为它经常无法产生结果,有时甚至产生不正确的结果。 此外,有时它无法显示标题中的某些字符,因为它们是替代编码。
有人对这个旧版本有改进建议吗?
public static string SuggestTitle(string url, int timeout)
{
WebResponse response = null;
string line = string.Empty;
try
{
WebRequest request = WebRequest.Create(url);
request.Timeout = timeout;
response = request.GetResponse();
Stream streamReceive = response.GetResponseStream();
Encoding encoding = System.Text.Encoding.GetEncoding("utf-8");
StreamReader streamRead = new System.IO.StreamReader(streamReceive, encoding);
while(streamRead.EndOfStream != true)
{
line = streamRead.ReadLine();
if (line.Contains("<title>"))
{
line = line.Split(new char[] { '<', '>' })[2];
break;
}
}
}
catch (Exception) { }
finally
{
if (response != null)
{
response.Close();
}
}
return line;
}
最后一点 - 我也希望代码运行得更快,因为它会阻塞直到获取页面,所以如果我只能获取网站标题而不是整个页面,那就太好了。
I'm revisiting som old code of mine and have stumbled upon a method for getting the title of a website based on its url. It's not really what you would call a stable method as it often fails to produce a result and sometimes even produces incorrect results. Also, sometimes it fails to show some of the characters from the title as they are of an alternative encoding.
Does anyone have suggestions for improvements over this old version?
public static string SuggestTitle(string url, int timeout)
{
WebResponse response = null;
string line = string.Empty;
try
{
WebRequest request = WebRequest.Create(url);
request.Timeout = timeout;
response = request.GetResponse();
Stream streamReceive = response.GetResponseStream();
Encoding encoding = System.Text.Encoding.GetEncoding("utf-8");
StreamReader streamRead = new System.IO.StreamReader(streamReceive, encoding);
while(streamRead.EndOfStream != true)
{
line = streamRead.ReadLine();
if (line.Contains("<title>"))
{
line = line.Split(new char[] { '<', '>' })[2];
break;
}
}
}
catch (Exception) { }
finally
{
if (response != null)
{
response.Close();
}
}
return line;
}
One final note - I would like the code to run faster as well, as it is blocking until the page as been fetched, so if I can get only the site header and not the entire page, it would be great.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
获取内容的更简单方法:
获取标题的更简单、更可靠的方法:
A simpler way to get the content:
A simpler, more reliable way to get the title:
也许有了这个建议,一个新世界就为你打开了
我也有这个问题,并来到这个
Download "Html Agility Pack" from http://html- agility-pack.net/?z=codeplex
或转到 nuget:https://www .nuget.org/packages/HtmlAgilityPack/
并添加此参考。
在代码文件中添加以下内容:
在您的方法中编写以下代码:
来源:
https://codeshare.co.uk/blog/how-to-scrape-meta-data-from-a-url-using-htmlagilitypack-in-c /
HtmlAgilityPack 获取标题和元
Perhaps with this suggestion a new world opens up for you
I also had this question and came to this
Download "Html Agility Pack" from http://html-agility-pack.net/?z=codeplex
Or go to nuget: https://www.nuget.org/packages/HtmlAgilityPack/
And add in this reference.
Add folow using in the code file:
Write folowing code in your methode:
Sources:
https://codeshare.co.uk/blog/how-to-scrape-meta-data-from-a-url-using-htmlagilitypack-in-c/
HtmlAgilityPack obtain Title and meta
为了实现这一目标,您需要做几件事。
我之前已经使用 SEO 机器人完成了此操作我已经能够一次性处理近 10,000 个请求。 您只需要确保每个 Web 请求都可以独立包含在线程中。
Inorder to accomplish this you are going to need to do a couple of things.
I have done this before with SEO bots and I have been able to handle almost 10,000 requests at a single time. You just need to make sure that each web request can be self contained in a thread.