如何使 Google 可以抓取我的 AJAX 内容?
我一直在开发一个大量使用 jQuery 并通过 AJAX 加载内容的网站,如下所示:
$('#newPageWrapper').load(newPath + ' .pageWrapper', function() {
//on load logic
}
现在我注意到 Google 不会通过 Javascript 索引任何动态加载的内容,因此我一直在寻找解决方案到问题。
我已经通读了 Google 的使 AJAX 应用程序可抓取文档大约 100 遍我仍然不明白如何实现它(很大程度上是由于我对服务器的了解有限)。
所以我的第一个问题是:
- 是否有您知道的一个不错的分步教程,从头到尾记录了这一点?我尝试用谷歌搜索它,但没有找到任何有用的东西。
其次,如果还没有任何东西,任何人都可以解释:
如何“设置我的服务器来处理 请求包含以下内容的 URL _escaped_fragment_'
如何实现 HtmlUnit 我的服务器创建一个 'HTML 要显示给的页面的快照' 爬虫。
如果有人能为我阐明这一点,我将非常感激,提前致谢!
-本
I've been working on a site that uses jQuery heavily and loads in content via AJAX like so:
$('#newPageWrapper').load(newPath + ' .pageWrapper', function() {
//on load logic
}
It has now come to my attention that Google won't index any dynamically loaded content via Javascript and so I've been looking for a solution to the problem.
I've read through Google's Making AJAX Applications Crawlable document what seems like 100 times and I still don't understand how to implement it (due in the most part to my limited knowledge of servers).
So my first question would be:
- Is there a decent step-by-step tutorial out there that documents this from start to finish that you know of? I've tried to Google it and I'm not finding anything useful.
And secondly, if there isn't anything out there yet, would anyone be able to explain:
How to 'Set up my server to handle
requests for URLs that contain
_escaped_fragment_'How to implement HtmlUnit on
my server to create an 'HTML
snapshot' of the page to show to the
crawler.
I would be incredibly grateful if someone could shed some light on this for me, thanks in advance!
-Ben
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
最好的解决方案是创建一个可以使用或不使用 JavaScript 的网站。阅读有关渐进增强的文章。
The best solution is to make a site that works with and without JavaScript. Read articles on Progressive enhancement.
我找不到替代方案,所以我采纳了 epascarello 的建议,现在如果 URL 包含“_escaped_fragment_”(如果爬虫访问,URL 将包含该内容),我将使用 php 生成内容
对于那些搜索:
I couldn't find an alternative so I took epascarello's advice and now I'm generating the content with php if the URL includes '_escaped_fragment_' (the URL will include that if a crawler visits)
For those searching:
如今,这个问题通常可以通过使用一项服务来解决,该服务插入 Google 方案的实现,用于 制作 AJAX应用程序可在网络服务器级别抓取。您不必再自己做这件事了。
我在以下公司之一工作:https://ajaxsnapshots.com(还有其他公司)
These days this problem is typically solved by using a service that plugs an implementation of Google's scheme for Making AJAX Applications Crawlable in at web server level. You don't have to do it yourself any more.
I work for one of these companies: https://ajaxsnapshots.com (there are others)