当前位置：文江博客话题详情

使用 #! 时避免额外的页面加载AJAX导航

发布于 2024-12-01 11:16:17 字数 3584 浏览 1 评论 0原文

我正在编写一个网站，该网站基本上是一系列连续的页面。不合格的 URL 指向最后一页，合格的 URL 指向特定页面。所以我们有：

http://example.com/ ->最后一页
http://example.com/3 -> page 3

我有以下要求：

我希望页面能够通过 AJAX 动态加载。用户可能会在短时间内按顺序浏览大量连续页面，并且我希望避免重新加载和重绘（并添加某种相邻页面的 JavaScript 预取以加快导航速度）
我希望这些页面可以添加书签
我希望这些页面可以可爬行
我希望它适用于大多数用户。

因此，我想出了以下方案：

存档页面的标准规范 URL 是 http://example.com/3 。无论是否安装 JavaScript，通过此 URL 访问的页面都是完全可读的。如果 JavaScript 不可用，则到其他页面的链接是正常链接，一切都会以老式的方式进行。这也适用于爬虫。
如果支持 JavaScript 和历史操作（pushState 和 ReplaceState）的浏览器访问此类 URL，则当访问其他页面的链接时，会使用 AJAX 动态更新内容。然后浏览器历史记录由 JavaScript 更新。
如果 JavaScript 可用但历史记录操作不可用的浏览器访问该站点，则使用 JavaScript 将用户重定向到 http://example.com/#!3 访问 http://example.com/3 时一个>。访问主页 http://example.com 时不会发生重定向。然后使用哈希更改事件完成导航。
如果抓取工具尝试通过外部链接访问 http://example.com/#!3 ，它实际上会请求 http://example.com/?_escaped_fragment_=3 (1)。对于此类 URL，网络服务器将发出永久重定向到 http://example.com/3，这将是然后正确爬行。

这里一切看起来都很好，但是接下来，让我们看看所有可能的情况：

1. 没有 JavaScript 浏览器

1.1。没有 JavaScript 浏览器访问根 URL

按预期工作。

1.2.没有 JavaScript 浏览器访问规范 URL

按预期工作。

1.3.没有 JavaScript 浏览器访问 shebang URL

( http://example.com/#!3 )

默默失败！用户获取最后一页而不是第3页。

2. 爬虫

2.1.爬网程序访问根 URL

按预期工作。

2.2.爬虫访问规范 URL

按预期工作。

2.3.爬虫访问shebang URL

http://example.com/#!3

爬虫实际请求http://example.com/?_escaped_fragment_=3，并被重定向到规范 URL。这是对 Web 服务器的一个额外请求，但这没什么大不了的（不会从数据库中获取额外的页面并返回给用户）。

3.旧浏览器

3.1。旧浏览器访问根 URL

到其他页面的链接将被其 shebang 版本替换。然后导航就可以通过 AJAX 顺利完成。

3.2.旧浏览器访问规范 URL

用户被重定向到 shebang URL。问题：服务器实际上会获取并提供页面 3 次！

http://example.com/3 <- 由服务器完全提供
JavaScript 发出重定向到 http://example.com/#!3
服务器只能看到 http://example.com/，并提供最后一页
JSON 用于（再次）获取第 3 页并替换该页面上的内容。

3.3.旧浏览器访问 shebang URL

无需重定向，但最后一页有额外负载！

用户访问 http://example.com/#!3
服务器只能看到 http://example.com/，并提供最后一个页面
JSON 用于获取第 3 页并替换页面上的内容。

4. 现代浏览器

4.1。现代浏览器访问根 URL

4.2。现代浏览器访问规范 URL

这两种情况没有问题。导航是通过 AJAX 完成的，规范的 URL 会被推送到历史记录中。

4.3.现代浏览器访问 shebang URL

将 URL 修改为规范 URL。内容通过 AJAX 加载。和以前一样，问题在于这会导致服务器对最后一页产生额外的负载。

所以，总结一下：

这个解决方案基本上可以工作
但是当使用shebang URL时，在大多数情况下，它会导致最后一个页面不必要的加载，因为服务器永远不会知道它是否需要服务器最后一页，或者这是否实际上是 shebang 请求的一部分。

我的问题是：有人对如何避免这些额外的页面加载有好主意吗？有什么好的做法可以解决这样的问题吗？

原文

I'm writing a web site which is basically a succession of sequential pages. The unqualified URL points to the last page, and qualified URLs point to specific pages. So we have:

http://example.com/ -> last page
http://example.com/3 -> page 3

I have the following requirements:

I want pages to be dynamically loaded by AJAX. Users may sequentially browse lots of successive pages in a quick time, and I want to avoid reloads and redraws (and add some kind of JavaScript prefetching of adjacent pages to speed up navigation)
I want the pages to be bookmarkable
I want the pages to be crawlable
I want it to work for most users.

So, I came up with the following scheme:

The standard, canonical URL for the archive pages is http://example.com/3. Pages accessed via this URL are fully readable, whether JavaScript is installed or not. If JavaScript is not available, links to other pages are normal links, and everything works in the good old-fashioned way. This also works for crawlers.
If a browser where JavaScript, and History manipulation (pushState and replaceState) are available visits such and URL, content is dynamically updated using AJAX when links to other pages are visited. The browser history is then updated by JavaScript.
If a browser where JavaScript is available, but history manipulation is not available, visits the site, the user is redirected, using JavaScript, to http://example.com/#!3 when visiting http://example.com/3 . No redirection occurs when visiting the main page http://example.com . Navigation is then done using hash change events.
If a crawler tries to visit, through an external link, http://example.com/#!3, it will actually request http://example.com/?_escaped_fragment_=3 (1). For such URLs, the webserver will issue a permanent redirect to http://example.com/3, which will be then correctly crawled.

Everything looks fine in here, but then, let's look at all possible cases:

1. No JavaScript browser

1.1. No JavaScript browser visits root URL

Works as expected.

1.2. No JavaScript browser visits canonical URL

Works as expected.

1.3. No JavaScript browser visits shebang URL

( http://example.com/#!3 )

Silently fails! The user gets the last page instead of page 3.

2. Crawler

2.1. Crawler visits root URL

Works as expected.

2.2. Crawler visits canonical URL

Works as expected.

2.3. Crawler visits shebang URL

http://example.com/#!3

The crawler actually requests http://example.com/?_escaped_fragment_=3, and is issued a redirect to the canonical URL. This is one additional request for the web server, but this is no big deal (no extra page is fetched from the database and returned to the user).

3. Old browser

3.1. Old browser visits root URL

Links to other pages are replaced by their shebang versions. Navigation then is done smoothly by AJAX.

3.2. Old browser visits canonical URL

The user is issued a redirect to the shebang URL. Problem: the server will actually fetch and serve the page 3 times!

http://example.com/3 <- served fully by the server
JavaScript issues a redirect to http://example.com/#!3
The server only sees http://example.com/, and serves the last page
JSON is used to fetch (again) page 3 and replace the content on the page.

3.3. Old browser visits shebang URL

No redirect is necessary, but there is an extra load for the last page!

The user goes to http://example.com/#!3
The server only sees http://example.com/, and serves the last page
JSON is used to fetch page 3 and replace the content on the page.

4. Modern browser

4.1. Modern browser visits root URL

4.2. Modern browser visits canonical URL

No problem with those 2 cases. Navigation is done by AJAX, canonical URLs are pushed to the history.

4.3. Modern browser visits shebang URL

The URL is modified to the canonical URL. The content is loaded by AJAX. The problem, as before, is that is causes an extra load, from the server, for the last page.

So, to sum it up:

This solution would basically work
But when shebang URLs are used, it will, in most case, cause an unnecessary load of the last page, because the server will never know whether it needs to server the last page, OR whether this is actually part of a shebang request.

My question is: does somebody have a good idea on how to avoid those extra page loads? Is there any good practice to solve such a problem?

分享到QQ

分享到微博