正则表达式从网页中提取 Favicon url
请帮助我使用正则表达式从下面的示例 html 中找到 Favicon url。它还应该检查文件扩展名“.ico”。我正在开发一个个人书签网站,我想保存我添加书签的链接的图标。我已经编写了 C# 代码来将图标转换为 gif 并保存,但我对正则表达式的了解非常有限,因此我无法选择此标签,因为不同站点的结束标签不同。结束标签“/>”的示例“/链接>”
我的编程语言是 C#
<meta name="description" content="Create 360 degree rotation product presentation online with 3Dbin. 360 product pics, object rotationg presentation can be created for your website at 3DBin.com web service." />
<meta name="robots" content="index, follow" />
<meta name="verify-v1" content="x42ckCSDiernwyVbSdBDlxN0x9AgHmZz312zpWWtMf4=" />
<link rel="shortcut icon" href="http://3dbin.com/favicon.ico" type="image/x-icon" />
<link rel="stylesheet" type="text/css" href="http://3dbin.com/css/1261391049/style.min.css" />
<!--[if lt IE 8]>
<script src="http://3dbin.com/js/1261039165/IE8.js" type="text/javascript"></script>
<![endif]-->
解决方案:实现此目的的另一种方法 下载并添加对 htmlagilitypack dll 的引用。谢谢你帮助我。我真的很喜欢这个网站:)
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(readcontent);
if (doc.DocumentNode != null)
{
foreach (HtmlNode link in doc.DocumentNode.SelectNodes(@"//link[@href]"))
{
HtmlAttribute att = link.Attributes["href"];
if (att.Value.EndsWith(".ico"))
{
faviconurl = att.Value;
}
}
}
Please help me to find the Favicon url from the sample html below using Regular expression. It should also check for file extension ".ico". I am developing a personal bookmarking site and i want to save the favicons of links which i bookmark. I have already written the c# code to convert icon to gif and save but i have very limited knowledge about regex so i am unable to select this tag because ending tags are different in different sites . Example of ending tags "/>" "/link>"
My programming language is C#
<meta name="description" content="Create 360 degree rotation product presentation online with 3Dbin. 360 product pics, object rotationg presentation can be created for your website at 3DBin.com web service." />
<meta name="robots" content="index, follow" />
<meta name="verify-v1" content="x42ckCSDiernwyVbSdBDlxN0x9AgHmZz312zpWWtMf4=" />
<link rel="shortcut icon" href="http://3dbin.com/favicon.ico" type="image/x-icon" />
<link rel="stylesheet" type="text/css" href="http://3dbin.com/css/1261391049/style.min.css" />
<!--[if lt IE 8]>
<script src="http://3dbin.com/js/1261039165/IE8.js" type="text/javascript"></script>
<![endif]-->
solution: one more way to do this
Download and add reference to htmlagilitypack dll. Thanks for helping me. I really love this site :)
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(readcontent);
if (doc.DocumentNode != null)
{
foreach (HtmlNode link in doc.DocumentNode.SelectNodes(@"//link[@href]"))
{
HtmlAttribute att = link.Attributes["href"];
if (att.Value.EndsWith(".ico"))
{
faviconurl = att.Value;
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这应该与包含 href=http://3dbin.com/favicon.ico 的整个链接标记匹配
根据您的评论进行更正:
我看到您有一个 C# 解决方案 非常好!但以防万一您仍然想知道是否可以使用正则表达式来完成,以下表达式可以满足您的要求。比赛的第 1 组将只有 url。
利用它的简单 C# 片段:
将以下内容打印到控制台:
This should match the whole link tag that contain href=http://3dbin.com/favicon.ico
Correction based on your comment:
I see you have a C# solutions Excellent! But just in case you were still wondering if it could be done with regular expressions the following expression would do what you want. The group 1 of the match will have only the url.
Simple C# snipet that makes use of it:
which prints the following to the console:
也许......它并不强大,但可以工作。 (我使用perl正则表达式)
maybe... it is not robust, but could work. (I used perl regex)
这不是正则表达式的工作,如果您花 2 分钟在 StackOverflow 上寻找如何解析 HTML,您就会发现这一点。
改用 HTML 解析器!< /strong>
这是 Python 中的一个简单示例(我确信这在 C# 中同样可行):
This is not a job for a regular expression, as you'll see if you spend 2 minutes on StackOverflow looking for how to parse HTML.
Use an HTML parser instead!
Here's a trivial example in Python (I'm sure this is equally do-able in C#):
我不久前尝试过这个,所以这里的事情非常简单。首先,它尝试查找 /favicon.ico 文件。如果失败,我使用 Html Agility pack 加载页面,然后使用 xpath 查找任何标签。我循环遍历链接标签以查看它们是否具有 rel='icon' 属性。如果他们这样做,我会获取 href 属性并将其扩展为该站点的绝对 URL(如果它存在)。
请随意尝试并提供任何改进。
I had a go at this a wee while back so here is something that is pretty simple. First it attempts to find the /favicon.ico file. If that fails I load up the page using Html Agility pack and then use xpath to find any tags. I loop through the link tags to see if they have a rel='icon' attribute. If they do I grab the href attribute and expand that if it exists into an absolute url for that site.
Please feel free to play around with this and offer any improvements.