Facebook 喜欢点播元内容抓取器
你们曾经见过 FB 会在您将其粘贴到链接字段后立即抓取您在 facebook 上发布的链接(状态、消息等),并显示各种元数据、图像缩略图、页面链接或页面中的各种图像。来自视频相关链接(如 YouTube)的视频缩略图。
有什么想法可以复制这个函数吗?我正在考虑几个 gearman 工作人员,甚至更好的只是 javascript,它执行 xhr 请求并根据正则表达式或类似的东西解析内容......有什么想法吗?有链接吗?是否有人已经尝试做同样的事情并将其包装在一个很好的课程中?任何事物? :)
谢谢!
you guys ever saw that FB scrapes the link you post on facebook (status, message etc.) live right after you paste it in the link field and displays various metadata, a thumb of the image, various images from the a page link or a video thumb from a video related link (like youtube).
any ideas how one would copy this function? i'm thinking about a couple gearman workers or even better just javascript that does a xhr requests and parses the content based on regex's or something similar... any ideas? any links? did someone already tried to do the same and wrapped it in a nice class? anything? :)
thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Facebook 会查看您粘贴到链接字段中的页面 HTML 中的各种元信息。
title
和description
是两个显而易见的内容,但开发人员也可以使用< /code> 提供首选屏幕截图。我想你可以检查一下这些东西。如果缺少此标签,您始终可以使用网站缩略图生成 服务。
Facebook looks at various meta information in the HTML of the page that you paste into a link field. The
title
anddescription
are two obvious ones but a developer can also use<link rel="image_src" href="thumbnail.jpg" />
to provide a preferred screengrab. I guess you could check for these things. If this tag is missing you could always use a website thumbnail generation service.当我正在开发一个这样的项目时,它并不像看起来那么容易,编码问题,用javascript渲染内容,存在如此多的非语义网站是我遇到的大问题之一。特别是提取视频信息并尝试获得自动播放行为总是很棘手,有时甚至是不可能的。你可以在 http://www.embedify.me 中看到一个演示,它是用 .net 编写的,但它有一个服务接口,这样你就可以通过 javascript 调用它,还有 javascript api 来获得与 fb 中相同的 ui/行为。
As I am developing a project like that, it is not as easy as it seems, encoding issues, rendering content with javascript, existence of so many non-semantic websites are one of big problems I encountered. Especially extracting video info and trying to get auto-play behavior is always tricky or sometimes impossible. You can see a demo in http://www.embedify.me , it is written in .net but it has a service interface so you can call it via javascript, also there is javascript api to get the same ui/behavior as in fb.
FB 从 HTML 中抓取元标记。
即,当您输入 URL 时,FB 显示页面标题,后跟 URL(截断的),然后是
的内容。元素。至于缩略图的选择,我认为FB可能只选择那些超过特定尺寸的缩略图,即跳过按钮图形、1px间隔符等。
编辑:我不知道你到底在寻找什么,但这里有一个函数PHP 用于从页面中抓取相关数据。
http://simplehtmldom.sourceforge.net/
中的简单 HTML DOM 库
这使用了 看看FB是如何做的,看起来抓取是在服务器端完成的。
FB scrapes the meta tags from the HTML.
I.e. when you enter a URL, FB displays the page title, followed by the URL (truncated), and then the contents of the <meta name="description"> element.
As for the selection of thumbnails, I think maybe FB chooses only those that exceed certain dimensions, i.e. skipping over button graphics, 1px spacers, etc.
Edit: I don't know exactly what you're looking for, but here's a function in PHP for scraping the relevant data from pages.
This uses the simple HTML DOM library from http://simplehtmldom.sourceforge.net/
I've had a look at how FB does it, and it looks like the scraping is done at server side.