Facebook 的 BigPipe 和 SEO:伪装?
我对 非常感兴趣Facebook 的 BigPipe 技术,用于改善显示网页时的用户体验。缺点是它很大程度上基于 Javascript,而且对搜索引擎一点也不友好。
当在我自己的网站上开发类似的技术时,我设计了它,以便可以很容易地在服务器端禁用它以提供更多标准页面,而无需启用 BigPipe。现在,我正在寻找一种使其对爬虫友好的方法。
简单的方法是将非 BigPipe 内容提供给搜索引擎爬虫/机器人,并将管道内容提供给其他内容。这不应被视为隐藏:内容完全相同,布局相同(在执行 BigPipe 的 javascript 之后)。唯一改变它的交付方式,使其对爬虫更加友好。但 Google 会认为这是合法的吗?
第二种方法是使用另一个 Javascript 来解决这个问题。在第一个请求中,发送非 BigPipe 页面,其中包含一些将保存一些 cookie 的 Javascript。在后续请求中,仅当存在 cookie 时才发送 BigPipe 内容。第一个页面加载不会被优化,但其他页面会被优化。看起来是一个很好的解决方案,但我真的不喜欢乘以 cookie。
第三种方法是传输 BigPipe 内容,不像 Facebook 那样使用 HTML 注释,而是使用
标签。这将使 pagelet 看起来像:
;
而不是 Facebook 的方法:
; ;
这看起来很棒,简单,既适合爬虫又适合用户。但这对我来说似乎有点hackish,并且在 IE 7/8 中不起作用,因为
noscript
标记的内容在 DOM 中被忽略。对于这些浏览器来说,这将涉及一些肮脏的特殊情况。
然后,我更仔细地研究了 Facebook 的业务。似乎他们也在做同样的事情。页面在我的浏览器中进行了优化,但不在 Google 的缓存中。我尝试清除所有浏览器缓存和 cookie,并再次请求该页面。不管怎样,我一直通过 BigPipe 获取内容。他们没有使用任何基于 cookie 的技术。
那么,问题很简单:Facebook 是如何做到这一点的?第一种方法是否会被视为隐藏,或者它只适用于 Facebook,因为它是 Facebook?或者我错过了什么?
谢谢。
I'm quite interested in the Facebook's BigPipe technique for improving user experience when displaying web pages. The downside is that this is heavily Javascript-based, and not at all search-engine friendly.
When developing a similar technique on my own website, I designed it so it can very easily be disabled server-side to serve more standard pages, without BigPipe enabled. Now, I'm looking for a way to make it crawler-friendly.
The easy way would be to serve non-BigPipe content to search engine crawlers / bots, and pipelined content to the rest. This should not be considered as cloaking : the content is exactly the same, the layout is the same (after BigPipe's javascript has been executed). The only thing that changes it the way it is delivered, to make it more crawler-friendly. But will Google see this as legitimate?
The second way would be to use another Javascript to solve this problem. On the first request, send non-BigPipe page, that includes some Javascript that will save some cookie. On subsequent requests, send BigPipe content only if the cookie is presented. Very fist page load will not be optimized, but the other will. Looks like a great solution, but I don't really like multiplying cookies.
The third way would be to stream BigPipe content not using HTML comments as Facebook does, but using
<noscript>
tags. This would make a pagelet look like :<noscript id="pagelet_payload_foo">Some content to be indexed here</noscript> <script>onPageletArrive({id:'foo', [...]})</script>
instead of the Facebook's approach:
<code id="pagelet_payload_foo"><!-- Some content to be indexed here --></code> <script>onPageletArrive({id:'foo', [...]})</script>
This looks great, simple, both crawler friendly and user friendly. But this seems a little hackish to me, and does not work in IE 7/8 because the contents of the
noscript
tag is ignored in the DOM. That would involve some dirty special case for these browsers.
Then, I looked more closely at what Facebook does. Seems like they are doing the same. Pages are optimized in my browser, but are not in Google's cache. I tried to clear all my browser cache and cookies, and requested the page again. No matter what, I keep getting the content through BigPipe. They are not using any cookie-based technique.
Then, the question is simple : How does Facebook do that? Would the first method be considered as cloaking, or does it only work for Facebook because it is Facebook? Or did I miss something else?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
简单的答案是 Facebook 歧视搜索机器人并为它们提供不同的内容。这可以通过用户代理(正如我认为您在 Q 中暗示的那样)或通过查找 IP 地址来查看它是否与 Google 地址范围匹配。
我更喜欢完全静态版本,因为它还允许您优化速度,谷歌(也许还有其他人)将其包含在其索引中。
The easy answer is that Facebook discriminate search bots and serve them different content. That can be via the user agent (as I think you're implying in your Q) or by looking up the IP address to see if it matches a Google address range.
The fully static version would be my preference, because it also permits you to optimise for speed, something that Google (and perhaps others) include in its indexing.