如何在Python中限制对Web服务的请求速率?

发布于 2024-07-10 05:17:58 字数 606 浏览 11 评论 0原文

我正在开发一个与 Web 服务 API 交互的 Python 库。 与我遇到的许多 Web 服务一样,此服务会限制请求速率。 我想为类实例化提供一个可选参数 limit,如果提供的话,它将保留传出请求,直到指定的秒数过去。

据我了解,一般情况如下:类的实例通过方法发出请求。 当它发生时,该方法会发出一些信号,在某处设置锁定变量,并启动一个倒计时器,计时时间为 limit 中的秒数。 (很可能,锁就是倒计时器本身。)如果在此时间范围内发出另一个请求,则必须排队,直到倒计时器达到零并且锁被解除; 此时,队列中最旧的请求被发送,倒计时器被重置并重新锁定。

这是线程的情况吗? 还有另一种我没有看到的方法吗?

倒计时器和锁应该是实例变量,还是应该属于该类,以便该类的所有实例都保留请求?

另外,在库中提供速率限制功能通常是一个坏主意吗? 我的原因是,默认情况下,倒计时为零秒,该库仍然允许开发人员使用该库并提供自己的速率限制方案。 然而,鉴于任何使用该服务的开发人员都需要对请求进行速率限制,我认为图书馆提供一种速率限制方法会很方便。

无论是否在库中放置速率限制方案,我都想使用该库编写一个应用程序,因此建议的技术会派上用场。

I'm working on a Python library that interfaces with a web service API. Like many web services I've encountered, this one requests limiting the rate of requests. I would like to provide an optional parameter, limit, to the class instantiation that, if provided, will hold outgoing requests until the number of seconds specified passes.

I understand that the general scenario is the following: an instance of the class makes a request via a method. When it does, the method emits some signal that sets a lock variable somewhere, and begins a countdown timer for the number of seconds in limit. (In all likelihood, the lock is the countdown timer itself.) If another request is made within this time frame, it must be queued until the countdown timer reaches zero and the lock is disengaged; at this point, the oldest request on the queue is sent, and the countdown timer is reset and the lock is re-engaged.

Is this a case for threading? Is there another approach I'm not seeing?

Should the countdown timer and lock be instance variables, or should they belong to the class, such that all instances of the class hold requests?

Also, is this generally a bad idea to provide rate-limiting functionality within a library? I reason since, by default, the countdown is zero seconds, the library still allows developers to use the library and provide their own rate-limiting schemes. Given any developers using the service will need to rate-limit requests anyway, however, I figure that it would be a convenience for the library to provide a means of rate-limiting.

Regardless of placing a rate-limiting scheme in the library or not, I'll want to write an application using the library, so suggested techniques will come in handy.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

北座城市 2024-07-17 05:17:59

除非需要,否则不要重新发明轮子。 检查很棒的库ratelimit。 如果您只是想出于某种原因限制对 Rest api 的调用并继续您的生活,那么这是完美的选择。

from datetime import timedelta
from ratelimit import limits, sleep_and_retry
import requests

@sleep_and_retry
@limits(calls=1, period=timedelta(seconds=60).total_seconds())
def get_foobar():
    response = requests.get('https://httpbin.org/get')
    response.raise_for_status()
    return response.json()

如果每分钟发出的请求多于一个,这将阻塞线程。

Don't reinvent the wheel, unless it's called for. Check the awesome library ratelimit. Perfect if you just want to rate limit your calls to an rest api for whatever reason and get on with your life.

from datetime import timedelta
from ratelimit import limits, sleep_and_retry
import requests

@sleep_and_retry
@limits(calls=1, period=timedelta(seconds=60).total_seconds())
def get_foobar():
    response = requests.get('https://httpbin.org/get')
    response.raise_for_status()
    return response.json()

This will block the thread if more requests than one per minute is issued.

烟花易冷人易散 2024-07-17 05:17:59

如果使用队列和调度程序,效果会更好。

您将处理分为两个方面:调度。 这些可以是单独的线程(或者单独的进程,如果这样更容易的话)。

端以任何让他们满意的速率创建请求并将其排队。

Dispatch 端执行此操作。

  1. 获取请求开始时间,s

  2. 使请求出列,通过远程服务处理请求。

  3. 获取当前时间,t。 休眠 rate - (t - s) 秒。

如果您想运行直接连接到远程服务的端,您可以这样做,并绕过速率限制。 这对于使用远程服务的模拟版本进行内部测试很有用。

困难的部分是为您可以排队的每个请求创建一些表示。 由于 Python Queue 几乎可以处理任何事情,因此您不必做太多事情。

如果您使用多重处理,则必须 pickle 您的对象将它们放入管道中。

This works out better with a queue and a dispatcher.

You split your processing into two sides: source and dispatch. These can be separate threads (or separate processes if that's easier).

The Source side creates and enqueues requests at whatever rate makes them happy.

The Dispatch side does this.

  1. Get the request start time, s.

  2. Dequeues a request, process the request through the remote service.

  3. Get the current time, t. Sleep for rate - (t - s) seconds.

If you want to run the Source side connected directly to the remote service, you can do that, and bypass rate limiting. This is good for internal testing with a mock version of the remote service.

The hard part about this is creating some representation for each request that you can enqueue. Since the Python Queue will handle almost anything, you don't have to do much.

If you're using multi-processing, you'll have to pickle your objects to put them into a pipe.

黯淡〆 2024-07-17 05:17:59

排队可能过于复杂。 一个更简单的解决方案是为您的类提供一个变量来表示上次调用服务的时间。 每当调用服务时 (!1),请将 waitTime 设置为 delay - Now + lastcalltimedelay 应等于请求之间允许的最短时间。 如果该数字为正数,请在拨打电话之前休眠那么长时间 (!2)。 这种方法的缺点/优点是它将 Web 服务请求视为同步的。 优点是它非常简单且易于实现。

  • (!1):应该在收到来自服务的响应后立即发生,在包装器内(可能在包装器的底部)。
  • (!2):应该在包装器顶部调用 Web 服务周围的 Python 包装器时发生。

当然,S.Lott 的解决方案更加优雅。

Queuing may be overly complicated. A simpler solution is to give your class a variable for the time the service was last called. Whenever the service is called (!1), set waitTime to delay - Now + lastcalltime. delay should be equal to the minimum allowable time between requests. If this number is positive, sleep for that long before making the call (!2). The disadvantage/advantage of this approach is that it treats the web service requests as being synchronous. The advantage is that it is absurdly simple and easy to implement.

  • (!1): Should happen right after receiving a response from the service, inside the wrapper (probably at the bottom of the wrapper).
  • (!2): Should happen when the python wrapper around the web service is called, at the top of the wrapper.

S.Lott's solution is more elegant, of course.

小兔几 2024-07-17 05:17:59

yfinance 包的文档中,它们展示了一种很好且简洁的方法来进行速率限制和响应缓存同时。 在开发和调试过程中,我经常会一遍又一遍地执行相同的请求。

from requests import Session
from requests_cache import CacheMixin, SQLiteCache
from requests_ratelimiter import LimiterMixin, MemoryQueueBucket
from pyrate_limiter import Duration, RequestRate, Limiter

class CachedLimiterSession(CacheMixin, LimiterMixin, Session)

session = CachedLimiterSession(
    limiter=Limiter(RequestRate(2, Duration.SECOND*5)),  # max 2 requests per 5 seconds
    bucket_class=MemoryQueueBucket,
    backend=SQLiteCache("yfinance.cache"),
)
response = requests.get('https://httpbin.org/get')
response.raise_for_status()
response.json()

In the doc of the yfinance package they show a nice and concise way to do rate limiting and response caching at the same time. As during development and debugging I often end up doing the same requests over and over again.

from requests import Session
from requests_cache import CacheMixin, SQLiteCache
from requests_ratelimiter import LimiterMixin, MemoryQueueBucket
from pyrate_limiter import Duration, RequestRate, Limiter

class CachedLimiterSession(CacheMixin, LimiterMixin, Session)

session = CachedLimiterSession(
    limiter=Limiter(RequestRate(2, Duration.SECOND*5)),  # max 2 requests per 5 seconds
    bucket_class=MemoryQueueBucket,
    backend=SQLiteCache("yfinance.cache"),
)
response = requests.get('https://httpbin.org/get')
response.raise_for_status()
response.json()

默嘫て 2024-07-17 05:17:59

您的速率限制方案应该很大程度上受到底层代码的调用约定(同步或异步)以及此速率限制将运行的范围(线程、进程、机器、集群?)的影响。

我建议将所有变量保留在实例中,以便您可以轻松实现多个控制周期/速率。

最后,听起来您想成为一个中间件组件。 不要尝试成为一个应用程序并自行引入线程。 如果您是同步的,则只需阻止/睡眠;如果您被其中之一调用,则使用异步调度框架。

Your rate limiting scheme should be heavily influenced by the calling conventions of the underlying code (syncronous or async), as well as what scope (thread, process, machine, cluster?) this rate-limiting will operate at.

I would suggest keeping all the variables within the instance, so you can easily implement multiple periods/rates of control.

Lastly, it sounds like you want to be a middleware component. Don't try to be an application and introduce threads on your own. Just block/sleep if you are synchronous and use the async dispatching framework if you are being called by one of them.

热鲨 2024-07-17 05:17:59

如果您的库被设计为同步,那么我建议忽略限制强制执行(尽管您可以跟踪速率并至少帮助调用者决定如何遵守限制)。

我使用 twisted 来与当今几乎所有东西进行交互。 通过使用将请求提交与响应处理分开的模型,可以轻松地完成此类操作。 如果您不希望您的 API 用户必须使用twisted,那么您至少最好了解他们的延迟执行 API。

例如,我有一个 Twitter 界面,它代表 xmpp 用户。 我没有速率限制,但我确实必须做一些工作来防止所有请求同时发生。

If your library is designed to be synchronous, then I'd recommend leaving out the limit enforcement (although you could track rates and at least help the caller decide how to honor limits).

I use twisted to interface with pretty much everything nowadays. It makes it easy to do that type of thing by having a model that separates request submission from response handling. If you don't want your API users to have to use twisted, you'd at least be better off understanding their API for deferred execution.

For example, I have a twitter interface that pushes a rather absurd number of requests through on behalf of xmpp users. I don't rate limit, but I did have to do a bit of work to prevent all of the requests from happening at the same time.

但可醉心 2024-07-17 05:17:59

使用 time.sleep() 在请求之间添加 2 秒暂停 像这样:

import time
import requests

for i in range(10):
    requests.get('http://example.com')
    time.sleep(2)

Add a 2 second pause between requests using time.sleep() like this:

import time
import requests

for i in range(10):
    requests.get('http://example.com')
    time.sleep(2)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文