在用户等待时处理大量数据的最佳实践(在 Rails 中)?

发布于 2024-10-02 01:43:32 字数 798 浏览 6 评论 0原文

我有一个小书签,使用时会将当前浏览器页面上的所有 URL 提交到 Rails 3 应用程序进行处理。在幕后,我使用 Typhoeus 来检查每个 URL 是否返回 2XX 状态代码。目前,我通过向 Rails 服务器发出 AJAX 请求来启动此过程,然后只需等待它处理并返回结果。对于较小的集合来说,这非常快,但是当 URL 的数量相当大时,用户可能会等待长达 10-15 秒。

我考虑过使用延迟作业在用户线程之外处理此问题,但这似乎不是正确的用例。由于用户需要等到处理完成才能看到结果,并且延迟作业可能需要长达五秒的时间才能开始作业,因此我无法保证处理会尽快发生。不幸的是,在这种情况下,这个等待时间是不可接受的。

理想情况下,我认为应该发生的是这样的:

  • 用户点击小书签
  • 数据被发送到服务器进行处理
  • 等待页面立即返回,同时分出一个线程来进行处理
  • 等待页面通过 ajax 定期轮询处理结果,更新等待页面(例如:“已处理 567 个 URL 中的 4 个...”)
  • 一旦结果准备好,等待页面就会更新结果

一些额外的细节:

  • 我正在使用 Heroku(长时间运行的进程在 30 秒后被终止
  • )登录和匿名用户可以使用此功能

这是执行此操作的典型方法,还是有更好的方法?我应该滚动自己的线程外处理来在处理期间更新数据库,还是有类似延迟作业的东西可以用于此目的(并且适用于 Heroku)?任何朝正确方向的推动将不胜感激。

I have a bookmarklet that, when used, submits all of the URLs on the current browser page to a Rails 3 app for processing. Behind the scenes I'm using Typhoeus to check that each URL returns a 2XX status code. Currently I initiate this process via an AJAX request to the Rails server and simply wait while it processes and returns the results. For a small set, this is very quick, but when the number of URLs is quite large, the user can be waiting for up to, say, 10-15 seconds.

I've considered using Delayed Job to process this outside the user's thread, but this doesn't seem like quite the right use-case. Since the user needs to wait until the processing is finished to see the results and Delayed Job may take up to five seconds before the job is even started, I can't guarantee that the processing will happen as soon as possible. This wait time isn't acceptable in this case unfortunately.

Ideally, what I think should happen is this:

  • User hits bookmarklet
  • Data is sent to the server for processing
  • A waiting page is instantly returned while spinning off a thread to do the processing
  • The waiting page periodically polls via ajax for the results of the processing and updates the waiting page (ex: "4 of 567 URLs processed...")
  • the waiting page is updated with the results once they are ready

Some extra details:

  • I'm using Heroku (long running processes are killed after 30 seconds)
  • Both logged in and anonymous users can use this feature

Is this a typical way to do this, or is there a better way? Should I just roll my own off-thread processing that updates the DB during processing or is there something like Delayed Job that I can use for this (and that works on Heroku)? Any pushes in the right direction would be much appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

來不及說愛妳 2024-10-09 01:43:32

我认为你的后一个想法最有道理。我只是将每个 url 检查的处理卸载到其自己的线程(因此所有 url 检查同时运行——无论如何,这应该比顺序检查快得多)。当每个线程完成时,它都会更新数据库(确保线程不会干扰彼此的写入)。 AJAX 端点(正如您所说,您在客户端定期轮询)将从数据库获取并返回已完成进程的计数。这是一个足够简单的方法,我真的认为不需要任何额外的组件。

I think your latter idea makes the most sense. I would just offload the processing of each url-check to its own thread (so all the url checks run concurrently -- which should be a lot faster than sequential checks anyway). As each finishes, it updates the database (making sure the threads don't step on each other's writes). An AJAX endpoint -- which, as you said, you poll regularly on the client side -- will grab and return the count of completed processes from the database. This is a simple enough method that I don't really see the need for any extra components.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文