如何从c#获取网站标题

发布于 2024-07-09 17:59:34 字数 1165 浏览 10 评论 0原文

我正在重新审视我的一些旧代码，并偶然发现了一种根据网址获取网站标题的方法。这实际上并不是所谓的稳定方法，因为它经常无法产生结果，有时甚至产生不正确的结果。此外，有时它无法显示标题中的某些字符，因为它们是替代编码。

有人对这个旧版本有改进建议吗？

public static string SuggestTitle(string url, int timeout)
{
    WebResponse response = null;
    string line = string.Empty;

    try
    {
        WebRequest request = WebRequest.Create(url);
        request.Timeout = timeout;

        response = request.GetResponse();
        Stream streamReceive = response.GetResponseStream();
        Encoding encoding = System.Text.Encoding.GetEncoding("utf-8");
        StreamReader streamRead = new System.IO.StreamReader(streamReceive, encoding);

        while(streamRead.EndOfStream != true)
        {
            line = streamRead.ReadLine();
            if (line.Contains("<title>"))
            {
                line = line.Split(new char[] { '<', '>' })[2];
                break;
            }
        }
    }
    catch (Exception) { }
    finally
    {
        if (response != null)
        {
            response.Close();
        }
    }

    return line;
}

最后一点 - 我也希望代码运行得更快，因为它会阻塞直到获取页面，所以如果我只能获取网站标题而不是整个页面，那就太好了。

原文

I'm revisiting som old code of mine and have stumbled upon a method for getting the title of a website based on its url. It's not really what you would call a stable method as it often fails to produce a result and sometimes even produces incorrect results. Also, sometimes it fails to show some of the characters from the title as they are of an alternative encoding.

Does anyone have suggestions for improvements over this old version?

public static string SuggestTitle(string url, int timeout)
{
    WebResponse response = null;
    string line = string.Empty;

    try
    {
        WebRequest request = WebRequest.Create(url);
        request.Timeout = timeout;

        response = request.GetResponse();
        Stream streamReceive = response.GetResponseStream();
        Encoding encoding = System.Text.Encoding.GetEncoding("utf-8");
        StreamReader streamRead = new System.IO.StreamReader(streamReceive, encoding);

        while(streamRead.EndOfStream != true)
        {
            line = streamRead.ReadLine();
            if (line.Contains("<title>"))
            {
                line = line.Split(new char[] { '<', '>' })[2];
                break;
            }
        }
    }
    catch (Exception) { }
    finally
    {
        if (response != null)
        {
            response.Close();
        }
    }

    return line;
}

One final note - I would like the code to run faster as well, as it is blocking until the page as been fetched, so if I can get only the site header and not the entire page, it would be great.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

╭⌒浅淡时光〆 2024-07-16 17:59:34

获取内容的更简单方法：

WebClient x = new WebClient();
string source = x.DownloadString("http://www.singingeels.com/");

获取标题的更简单、更可靠的方法：

string title = Regex.Match(source, @"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>",
    RegexOptions.IgnoreCase).Groups["Title"].Value;

A simpler way to get the content:

WebClient x = new WebClient();
string source = x.DownloadString("http://www.singingeels.com/");

A simpler, more reliable way to get the title:

string title = Regex.Match(source, @"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title\>",
    RegexOptions.IgnoreCase).Groups["Title"].Value;

回复收藏 0 原文

爱要勇敢去追 2024-07-16 17:59:34

也许有了这个建议，一个新世界就为你打开了
我也有这个问题，并来到这个

Download "Html Agility Pack" from http://html- agility-pack.net/?z=codeplex

或转到 nuget：https://www .nuget.org/packages/HtmlAgilityPack/
并添加此参考。

在代码文件中添加以下内容：

using HtmlAgilityPack;

在您的方法中编写以下代码：

var webGet = new HtmlWeb();
var document = webGet.Load(url);    
var title = document.DocumentNode.SelectSingleNode("html/head/title").InnerText;

来源：

https://codeshare.co.uk/blog/how-to-scrape-meta-data-from-a-url-using-htmlagilitypack-in-c /
HtmlAgilityPack 获取标题和元

Perhaps with this suggestion a new world opens up for you
I also had this question and came to this

Download "Html Agility Pack" from http://html-agility-pack.net/?z=codeplex

Or go to nuget: https://www.nuget.org/packages/HtmlAgilityPack/
And add in this reference.

Add folow using in the code file:

using HtmlAgilityPack;

Write folowing code in your methode:

var webGet = new HtmlWeb();
var document = webGet.Load(url);    
var title = document.DocumentNode.SelectSingleNode("html/head/title").InnerText;

Sources:

https://codeshare.co.uk/blog/how-to-scrape-meta-data-from-a-url-using-htmlagilitypack-in-c/
HtmlAgilityPack obtain Title and meta

回复收藏 0 原文

不知所踪 2024-07-16 17:59:34

为了实现这一目标，您需要做几件事。

让您的应用程序线程化，以便您可以同时处理多个请求并最大限度地增加发出的 HTTP 请求的数量。
在异步请求期间，仅下载您想要拉回的数据量，您可能可以在数据返回时对其进行解析，寻找
可能想要使用正则表达式来拉出标题名称

我之前已经使用 SEO 机器人完成了此操作我已经能够一次性处理近 10,000 个请求。您只需要确保每个 Web 请求都可以独立包含在线程中。

回复收藏 0 原文

~没有更多了~

关于作者

萌化

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

如何从c#获取网站标题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如何从c#获取网站标题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。