从链接创建摘要

发布于 2024-12-01 08:23:23 字数 243 浏览 1 评论 0原文

许多页面(facebook、google+ 等)都有一个功能,可以通过链接创建带有标题、图像和一些文本的摘要。我试图找出是否有任何关于如何执行此类功能的库或指南,但我的搜索结果根本没有帮助。

我知道我可以解析页面的 html 并提取我想要的元素,但我认为应该有某种标准来说明如何做到这一点(也许还包括如何创建对此类功能友好的页面。

任何有好的链接可以为我指明正确的方向吗?Javascript 或 .Net 是我的首选,但我也可以自己实现。

Many pages (facebook, google+ etc) have a function that creates a summary with header, image and some text from a link. I have tried to find out if there are any libraries or guidelines about how to do this kind of function but my search-results havn't been helpful at all.

I know that I can parse the html of a page and extract the elements I'd like but I think there should be some kind of standard in how to do this (perhaps also how to create pages that are friendly to this kind of functionallity.

Anyone that have a good link that will point me to the right direction? Javascript or .Net is my prefered choise but I can implement it myself too.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

归属感 2024-12-08 08:23:23

对于“也许还如何创建对这种功能友好的页面”。部分:
您可能正在寻找开放图协议

<html xmlns:og="http://ogp.me/ns#">
<head>
<title>The Rock (1996)</title>
<meta property="og:title" content="The Rock" />
<meta property="og:type" content="movie" />
<meta property="og:url" content="http://www.imdb.com/title/tt0117500/" />
<meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" />
...
</head>
...
</html>

我认为这是 facebook 会首先关注的地方。但 Facebook 似乎有自己的算法,可以在这些标签丢失时检测页面最相关的部分。

For the "perhaps also how to create pages that are friendly to this kind of functionallity." part:
You are probably searching for the open graph protocol:

<html xmlns:og="http://ogp.me/ns#">
<head>
<title>The Rock (1996)</title>
<meta property="og:title" content="The Rock" />
<meta property="og:type" content="movie" />
<meta property="og:url" content="http://www.imdb.com/title/tt0117500/" />
<meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" />
...
</head>
...
</html>

I think this is the first place facebook will look. But facebook seems to have its own algorithms to detect the most relevant part of the page when these tags are missing.

疯狂的代价 2024-12-08 08:23:23

许多页面(facebook、google+ 等)都有一个函数可以创建一个
摘要包含标题、图像和来自链接的一些文本。我已经尝试过
查明是否有关于如何执行此操作的任何库或指南
某种功能,但我的搜索结果根本没有帮助。

这样的函数通常是使用某种“爬行”来构建的,这意味着您的脚本打开链接并查看其数据。就像你自己建议的那样。

我知道我可以解析页面的 html 并提取元素
我愿意,但我认为应该有某种标准
这样做(也许还如何创建对此友好的页面
某种功能。

标准方式是大多数搜索引擎(例如 Google)的做法。您可以从网站的标题中获取标题,从描述中获取描述(如果有)。现在大多数搜索引擎都会忽略描述元数据,而是尝试制作自己的摘要。

这通常是通过查找标题(h1、h2 等)和段落来完成的。

为了使网站对此类抓取“友好”,您需要根据网络标准 (W3C) 构建网站。

任何拥有良好链接的人都会将我指向正确的方向
方向? Javascript 或 .Net 是我的首选,但我可以
我自己也实现了。

只要能够执行一些基本的 HTTP-GET 操作,语言实际上并不重要。

Many pages (facebook, google+ etc) have a function that creates a
summary with header, image and some text from a link. I have tried to
find out if there are any libraries or guidelines about how to do this
kind of function but my search-results havn't been helpful at all.

Such a function is usually build using some kind of "crawling", meaning your script opens the link and looks at its data. Just like you suggest yourself.

I know that I can parse the html of a page and extract the elements
I'd like but I think there should be some kind of standard in how to
do this (perhaps also how to create pages that are friendly to this
kind of functionallity.

Standard way is the way most search engines do it, like Google. You get the title from the title of the website, description from description if there are any. Most search engines now days ignore description meta data and instead try to make their own summary.

This is usual done by looking for headers (h1, h2, etc) ad then the paragraphs.

And to make a website "Friendly" for these kind of crawls you build your website according to web standards (W3C).

Anyone that have a good link that will point me to the right
direction? Javascript or .Net is my prefered choise but I can
implement it myself too.

Language really doesn't matter as long as it is capable of doing some basic HTTP-GET.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文