达到 max_instances 时停止正在运行的实例

发布于 2025-01-11 22:21:18 字数 1060 浏览 0 评论 0原文

我正在使用 apscheduler-django 并且创建了一个每 10 秒循环一次的任务。

此函数将向 API 发出请求并将内容保存到我的数据库 (PostgreSQL)。

这是我的任务：

scheduler.add_job(
  SaveAPI,
  trigger=CronTrigger(second="*/10"), 
  id="SaveAPI", 
  max_instances=1,
  replace_existing=True,
)

我的 SaveAPI 是：

def SaveAPI():
    SPORT = 3
    print('API Queue Started')
    AllMatches = GetAllMatches(SPORT)
    for Match in AllMatches:
        AddToDatabase(Match, SPORT)
    print(f'API Queue Ended')

GetAllMatches 和 AddToDatabase 太大，我不认为这些实现与我的相关问题。

我的问题是有时我会收到此错误：

Run time of job "SaveAPI (trigger: cron[second='*/10'], next run at: 2022-03-05 23:21:00 +0330)" was missed by 0:00:11.445357

发生这种情况时，它不会被新实例替换，因为我的 SaveAPI 函数没有结束。而且 apscheduler 总是会错过新的实例。

我做了很多测试，功能没有任何问题。

如果将错过新实例，如何使 apscheduler 停止最后一个正在运行的实例？

因此，如果我的最后一个实例花费了超过 10 秒，我只想终止该实例并创建一个新实例。

原文

I'm using apscheduler-django and I created a task that loops every 10 seconds.

This function will make a request to an API and save the content to my database (PostgreSQL).

This is my task:

scheduler.add_job(
  SaveAPI,
  trigger=CronTrigger(second="*/10"), 
  id="SaveAPI", 
  max_instances=1,
  replace_existing=True,
)

and my SaveAPI is:

def SaveAPI():
    SPORT = 3
    print('API Queue Started')
    AllMatches = GetAllMatches(SPORT)
    for Match in AllMatches:
        AddToDatabase(Match, SPORT)
    print(f'API Queue Ended')

The GetAllMatches and AddToDatabase are too big and I don't think the implementations are relevant to my question.

My problem is sometimes I will get this error:

Run time of job "SaveAPI (trigger: cron[second='*/10'], next run at: 2022-03-05 23:21:00 +0330)" was missed by 0:00:11.445357

When this happens, it will not get replaced with a new instance because my SaveAPI function doesn't end. And apscheduler will always miss new instances.

I did many tests and function does not have any problem.

How can I make apscheduler stop the last running instance if a new instance is going to be missed?

So if my last instance takes more than 10 seconds, I want to just terminate the instance and create a new one.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

何以畏孤独 2025-01-18 22:21:18

apscheduler 和 apscheduler-django 不直接支持这一点。

您可以实现并使用自定义执行器来跟踪运行作业的进程，如果尝试提交当前正在运行的作业，则终止该进程。

这是一个使用 pebble.ProcessPool< 的 MaxInstancesCancelEarliestProcessPoolExecutor /代码>。

class MaxInstancesCancelEarliestProcessPoolExecutor(BasePoolExecutor):
    def __init__(self):
        pool = ProcessPool()
        pool.submit = lambda function, *args: pool.schedule(function, args=args)
        super().__init__(pool)
        self._futures = defaultdict(list)

    def submit_job(self, job, run_times):
        assert self._lock is not None, 'This executor has not been started yet'
        with self._lock:
            if self._instances[job.id] >= job.max_instances:
                f = self._futures[job.id][0]                      # +
                f.cancel()                                        # +
                try:                                              # +
                    self._pool._pool_manager.update_status()      # +
                except RuntimeError:                              # +
                    pass                                          # +
                if self._instances[job.id] >= job.max_instances:  # +
                    raise MaxInstancesReachedError(job)

            self._do_submit_job(job, run_times)
            self._instances[job.id] += 1

    def _do_submit_job(self, job, run_times):
        def callback(f):
            with self._lock:                        # +
                self._futures[job.id].remove(f)     # +
                try:                                # +
                    exc, tb = (f.exception_info() if hasattr(f, 'exception_info') else
                               (f.exception(), getattr(f.exception(), '__traceback__', None)))
                except CancelledError:              # +
                    exc, tb = TimeoutError(), None  # +
                if exc:
                    self._run_job_error(job.id, exc, tb)
                else:
                    self._run_job_success(job.id, f.result())

        try:
            f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
        except BrokenProcessPool:
            self._logger.warning('Process pool is broken; replacing pool with a fresh instance')
            self._pool = self._pool.__class__(self._pool._max_workers)
            f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)

        f.add_done_callback(callback)
        self._futures[job.id].append(f)  # +

    def shutdown(self, wait=True):
        if wait:
            self._pool.close()
            self._pool.join()
        else:
            self._pool.close()
            threading.Thread(target=self._pool.join).start()

用法：

scheduler.add_executor(MaxInstancesCancelEarliestProcessPoolExecutor(), alias='max_instances_cancel_earliest')

scheduler.add_job(
    SaveAPI,
    trigger=CronTrigger(second="*/10"),
    id="SaveAPI",
    max_instances=1,
    executor='max_instances_cancel_earliest',  # +
    replace_existing=True,
)

apscheduler and apscheduler-django don't directly support that.

You can implement and use a custom executor that tracks the process running a job and kills the process if trying to submit a job that is currently running.

Here's a MaxInstancesCancelEarliestProcessPoolExecutor that uses pebble.ProcessPool.

class MaxInstancesCancelEarliestProcessPoolExecutor(BasePoolExecutor):
    def __init__(self):
        pool = ProcessPool()
        pool.submit = lambda function, *args: pool.schedule(function, args=args)
        super().__init__(pool)
        self._futures = defaultdict(list)

    def submit_job(self, job, run_times):
        assert self._lock is not None, 'This executor has not been started yet'
        with self._lock:
            if self._instances[job.id] >= job.max_instances:
                f = self._futures[job.id][0]                      # +
                f.cancel()                                        # +
                try:                                              # +
                    self._pool._pool_manager.update_status()      # +
                except RuntimeError:                              # +
                    pass                                          # +
                if self._instances[job.id] >= job.max_instances:  # +
                    raise MaxInstancesReachedError(job)

            self._do_submit_job(job, run_times)
            self._instances[job.id] += 1

    def _do_submit_job(self, job, run_times):
        def callback(f):
            with self._lock:                        # +
                self._futures[job.id].remove(f)     # +
                try:                                # +
                    exc, tb = (f.exception_info() if hasattr(f, 'exception_info') else
                               (f.exception(), getattr(f.exception(), '__traceback__', None)))
                except CancelledError:              # +
                    exc, tb = TimeoutError(), None  # +
                if exc:
                    self._run_job_error(job.id, exc, tb)
                else:
                    self._run_job_success(job.id, f.result())

        try:
            f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)
        except BrokenProcessPool:
            self._logger.warning('Process pool is broken; replacing pool with a fresh instance')
            self._pool = self._pool.__class__(self._pool._max_workers)
            f = self._pool.submit(run_job, job, job._jobstore_alias, run_times, self._logger.name)

        f.add_done_callback(callback)
        self._futures[job.id].append(f)  # +

    def shutdown(self, wait=True):
        if wait:
            self._pool.close()
            self._pool.join()
        else:
            self._pool.close()
            threading.Thread(target=self._pool.join).start()

Usage:

scheduler.add_executor(MaxInstancesCancelEarliestProcessPoolExecutor(), alias='max_instances_cancel_earliest')

scheduler.add_job(
    SaveAPI,
    trigger=CronTrigger(second="*/10"),
    id="SaveAPI",
    max_instances=1,
    executor='max_instances_cancel_earliest',  # +
    replace_existing=True,
)

回复收藏 0 原文

~没有更多了~