优化异步查找算法

发布于 2024-12-10 18:32:27 字数 769 浏览 0 评论 0原文

我有一系列连续命名的页面（URL，例如：http://example.com/book/1、http://example.com/book/2 等）但我没有办法提前知道有多少页。我需要检索每个页面（的特定部分），按顺序保持获取的信息，不遗漏任何页面，并请求最少量的空页面。

目前，我有一个递归异步函数，有点像这样：

pages = []

getPage = (page = 1) ->
  xhr.get "http://example.com/book/#{1}", (response) ->
    if isValid response
      pages.push response
      getPage page++
    else
      event.trigger "haveallpages"

getPage()

xhr.get 和 event.trigger 是伪代码，目前是 jQuery 方法（但这可能会改变））。 isValid 也是伪代码，实际上测试是在函数内定义的，但它很复杂并且与问题无关。

这种方法效果很好，但速度很慢，因为一次只处理一个请求。我正在寻找一种更好地利用 XHR 的异步特性并在更短的时间内检索完整列表的方法。有什么模式可以帮助我吗？或者更好的算法？

原文

I have a series of consecutively-named pages (URLs, like: http://example.com/book/1, http://example.com/book/2, etc.) but I have no way of knowing how many pages there are in advance. I need to retrieve (a particular part of) each page, keep the obtained info in order, miss no page, and request a minimum amount of null pages.

Currently, I have a recursive asynchronous function which is a bit like this:

pages = []

getPage = (page = 1) ->
  xhr.get "http://example.com/book/#{1}", (response) ->
    if isValid response
      pages.push response
      getPage page++
    else
      event.trigger "haveallpages"

getPage()

xhr.get and event.trigger is pseudo-code and are currently jQuery methods (but that may change). isValid is also pseudo-code, in reality the test in defined within the function, but it's complex and not relevant to the question.

This works well but is slow as only one request is processed at a time. What I'm looking for is a way to make better use of the asynchronous nature of XHRs and retrieve the complete list in less time. Is there a pattern which could help me here? Or a better algorithm?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ゞ花落谁相伴 2024-12-17 18:32:27

只需触发同时发出的请求，同时记录它们的数量即可。无需猜测上限，只需在请求开始失败时停止，就像在原始代码中一样。

这最多会产生 concurrency-1 个浪费的请求：

pages        = []
concurrency  = 5
currentPage  = 0
haveAllPages = false

getPage = (p) ->
  xhr.get "http://example.com/book/#{p}", (response) ->
    if isValid response
      pages.push response
      getPage ++currentPage if not haveAllPages
    else
      haveAllPages = true

while concurrency--
    getPage ++currentPage

Just fire simultaneous requests while keeping count of them. There is no need to guess the upper bound, simply stop when requests start to fail like in your original code.

This will generate at most concurrency-1 wasted requests:

pages        = []
concurrency  = 5
currentPage  = 0
haveAllPages = false

getPage = (p) ->
  xhr.get "http://example.com/book/#{p}", (response) ->
    if isValid response
      pages.push response
      getPage ++currentPage if not haveAllPages
    else
      haveAllPages = true

while concurrency--
    getPage ++currentPage

回复收藏 0 原文

~没有更多了~