如何使用Python执行调度?
我试图在蟒蛇内安排一些工作。假设,日志记录中的文本应每1分钟出现每1分钟,从Jobs.py文件显示我的Docker容器内的文件。但是,文本在Docker容器内每2分钟出现。 Python日程安排和Cronjobs之间是否有冲突?
Docker容器内部的当前输出
13:05:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:05:00] "GET /reminder/send_reminders HTTP/1.1" 200 -
13:06:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:06:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
13:07:00 [D] schedule Running job Job(interval=1, unit=minutes, do=job_feeds_update, args=(), kwargs={})
13:07:00 [I] jobs job_feeds_update
13:07:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:07:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
13:08:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:08:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
13:09:00 [D] schedule Running job Job(interval=1, unit=minutes, do=job_feeds_update, args=(), kwargs={})
13:09:00 [I] jobs job_feeds_update
13:09:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:09:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
13:10:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:10:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
13:10:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:10:00] "GET /reminder/send_reminders HTTP/1.1" 200 -
13:11:00 [D] schedule Running job Job(interval=1, unit=minutes, do=job_feeds_update, args=(), kwargs={})
13:11:00 [I] jobs job_feeds_update
13:11:00 [D] schedule Running job Job(interval=5, unit=minutes, do=job_send_reminders, args=(), kwargs={})
13:11:00 [I] jobs job_send_reminders
server.py
#Cron Job
@app.route('/feeds/update_feeds')
def update_feeds():
schedule.run_pending()
return 'OK UPDATED FEED!'
@app.route('/reminder/send_reminders')
def send_reminders():
schedule.run_pending()
return 'OK UPDATED STATUS!'
jobs.py
def job_feeds_update():
update_feed()
update_feed_eng()
logger.info("job_feeds_update")
schedule.every(1).minutes.do(job_feeds_update)
# send email reminders
def job_send_reminders():
send_reminders()
logger.info("job_send_reminders")
schedule.every(5).minutes.do(job_send_reminders)
docker文件
FROM alpine:latest
# Install curlt
RUN apk add --no-cache curl
# Copy Scripts to Docker Image
COPY reminders.sh /usr/local/bin/reminders.sh
COPY feeds.sh /usr/local/bin/feeds.sh
RUN echo ' */5 * * * * /usr/local/bin/reminders.sh' >> /etc/crontabs/root
RUN echo ' * * * * * /usr/local/bin/feeds.sh' >> /etc/crontabs/root
# Run crond -f for Foreground
CMD ["/usr/sbin/crond", "-f"]
I am trying to schedule a few jobs inside my python. Supposely , the text from the logging should appear every 1 minute and every 5 minute from jobs.py file inside my docker container. However, the text is appearing every 2minutes inside the docker container. Is there a clash between the python schedule and cronjobs ?
Current Output inside the docker container
13:05:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:05:00] "GET /reminder/send_reminders HTTP/1.1" 200 -
13:06:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:06:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
13:07:00 [D] schedule Running job Job(interval=1, unit=minutes, do=job_feeds_update, args=(), kwargs={})
13:07:00 [I] jobs job_feeds_update
13:07:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:07:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
13:08:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:08:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
13:09:00 [D] schedule Running job Job(interval=1, unit=minutes, do=job_feeds_update, args=(), kwargs={})
13:09:00 [I] jobs job_feeds_update
13:09:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:09:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
13:10:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:10:00] "GET /feeds/update_feeds HTTP/1.1" 200 -
13:10:00 [I] werkzeug 172.20.0.2 - - [08/May/2022 13:10:00] "GET /reminder/send_reminders HTTP/1.1" 200 -
13:11:00 [D] schedule Running job Job(interval=1, unit=minutes, do=job_feeds_update, args=(), kwargs={})
13:11:00 [I] jobs job_feeds_update
13:11:00 [D] schedule Running job Job(interval=5, unit=minutes, do=job_send_reminders, args=(), kwargs={})
13:11:00 [I] jobs job_send_reminders
server.py
#Cron Job
@app.route('/feeds/update_feeds')
def update_feeds():
schedule.run_pending()
return 'OK UPDATED FEED!'
@app.route('/reminder/send_reminders')
def send_reminders():
schedule.run_pending()
return 'OK UPDATED STATUS!'
jobs.py
def job_feeds_update():
update_feed()
update_feed_eng()
logger.info("job_feeds_update")
schedule.every(1).minutes.do(job_feeds_update)
# send email reminders
def job_send_reminders():
send_reminders()
logger.info("job_send_reminders")
schedule.every(5).minutes.do(job_send_reminders)
Docker File
FROM alpine:latest
# Install curlt
RUN apk add --no-cache curl
# Copy Scripts to Docker Image
COPY reminders.sh /usr/local/bin/reminders.sh
COPY feeds.sh /usr/local/bin/feeds.sh
RUN echo ' */5 * * * * /usr/local/bin/reminders.sh' >> /etc/crontabs/root
RUN echo ' * * * * * /usr/local/bin/feeds.sh' >> /etc/crontabs/root
# Run crond -f for Foreground
CMD ["/usr/sbin/crond", "-f"]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您遇到了几个问题:
时间表
的时间表/间隔不同。它们是不同步的(您永远不会期望它们出于下一个原因而成为同步)。从执行jobs.py
脚本的那一刻起,这就是时间表计数间隔的起点。即,如果您每分钟每分钟运行某些内容,但是
jobs.py
脚本从30秒开始到当前分钟(即过去01:00:00:30-1:00 AM 30秒),那么调度程序将在1:01:30进行工作,然后1:02:30,然后1:03:30等等。schedule
不能保证您精确频率执行。调度程序运行作业时,执行时间不会考虑在内。因此,如果您安排了诸如feed/提醒工作之类的东西,则可能需要一些时间来处理。完成运行后,调度程序决定下一个作业将在上一个作业结束后仅运行1分钟 。这意味着您的执行时间可以删除时间表。尝试在Python脚本中运行此示例,以查看我在说什么,
我们计划了
geeks
函数每秒运行。但是,如果您查看Geeks功能,我添加了一个time.sleep(5)
,以假装这里可能会有一些阻止API调用,可能需要5秒钟。然后观察已记录的时间戳 - 您会注意到它们并不总是与我们最初想要的时间表一致!现在,您的cron作业和调度程序是不同步的
查看以下日志:
这里可能发生的事情如下:
在13:07:00,您的cron将请求发送给feed的请求项目
在13:07:00,工作时间表有一个未决的供稿工作
在13:07:00:00,工作完成和时间表决定下一份工作只能从现在开始1分钟后运行,这大约是〜 13:08:01(请注意01,这是为了计算工作执行的毫秒/时机,让我们假设运行Feed项目更新需要1秒钟)
)
在13:08:00,您的cron作业触发请求询问
schepary
run_pending作业。在13:08:00 但是,没有待正在运行的工作,因为下次饲料可以运行的是 13:08:01 < /strong>现在不在。
在13:09:00,您的cron选项卡再次触发请求
在13:09:00,有一项未决的工作应该在 13:08:01 上运行立即执行。
我希望这可以说明您遇到的问题与Cron和Schedule之间不同步。在生产环境中,这个问题将变得更糟。您可以阅读更多有关并行执行代码>作为使事物远离主线程的一种手段,但这只会走这么远。让我们谈谈...
可能的解决方案
run_all
从计划而不是run_pending
迫使作业触发,无论何时实际计划为了。但是,如果您考虑一下,这与简单地从API路由本身直接调用
job_feeds_update
没有什么不同。这本身并不是一个坏主意,但它仍然不是超级干净的,因为它会阻止API服务器的主线程,直到job_feeds_update
完成,如果您有其他路由,这可能不是理想的选择。用户需要。您可以将其与下一个建议结合在一起:
查看并行执行页面附表文档的第二个示例。它向您展示了如何使用Jobqueue和线程来卸载作业。
因为您运行
schedule.run_pending()
,所以服务器中的主线程被阻止直到作业运行。通过使用线程( +作业队列),您可以在队列中保持调度作业 +避免使用作业阻止主服务器。这应该通过继续安排工作来为您提供进一步的优化。使用
iSchedule
而不是考虑工作执行时间并提供精确的时间表: https://pypi.org/project/ischedule/ 。对于您来说,这可能是最简单的解决方案,如果情况1+2最终会成为头痛!不要使用时间表,只需让您的cron工作遇到了一条仅运行实际功能的路线(因此,基本上与上面使用1+2的建议相反)。问题是,如果您的功能需要长达一分钟的时间才能进行供稿更新,则可能同时进行饲料更新,您可能会有多个重叠的CRON作业。因此,我建议不要这样做,并依靠一种机制来排队/安排您的请求,并通过线程和作业安排您的请求。仅提及这是您还能做什么的潜在情况。
I think you're running into a couple of issues:
schedule
is on a different schedule/interval than your cron job. They're out of sync (and you can't ever expect them to be in sync for the next reason). From the moment yourjobs.py
script was executed, that's the starting point from which the schedule counts the intervals.i.e. if you're running something every minute but the
jobs.py
script starts at 30 seconds into the current minute (i.e. 01:00:30 - 1:00am 30 seconds past), then the scheduler will run the job at 1:01:30, then 1:02:30, then 1:03:30 and so on.Schedule
doesn't guarantee you precise frequency execution. When the scheduler runs a job, the job execution time is not taken into account. So if you schedule something like your feeds/reminders jobs, it could take a little bit to process. Once it's finished running, the scheduler decides that the next job will only run 1 minute after the end of the previous job. This means your execution time can throw off the schedule.Try running this example in a python script to see what I'm talking about
We've scheduled the
geeks
function to run every second. But if you look at the geeks function, I've added atime.sleep(5)
to pretend that there may be some blocking API call here that can take 5 seconds. Then observe the timestamps logged - you'll notice they're not always consistent with the schedule we originally wanted!Now onto how your cron job and scheduler are out of sync
Look at the following logs:
What's likely happening here is as follows:
at 13:07:00, your cron sends the request to feed items
at 13:07:00, the job schedule has a pending job for feed items
at 13:07:00:, the job finishes and schedule decides the next job can only run after 1 minute from now, which is roughly ~13:08:01 (note the 01, this is to account for milliseconds/timing of job executions, which lets assume it took 1 second to run the feed items update)
at 13:08:00, your cron job triggers the request asking
schedule
run_pending jobs.at 13:08:00 however, there are no pending jobs to run because the next time feed items can run is 13:08:01 which is not right now.
at 13:09:00, your cron tab triggers the request again
at 13:09:00, there is a pending job available that should've run at 13:08:01 so that gets executed now.
I hope this illustrates the issue you're running into being out of sync between cron and schedule. This issue will become worse in a production environment. You can read more about Parallel execution for
schedule
as a means to keep things off the main thread but that will only go so far. Let's talk about...Possible Solutions
run_all
from schedule instead ofrun_pending
to force jobs to trigger, regardless of when they're actually scheduled for.But if you think about it, this is no different than simply calling
job_feeds_update
straight from your API route itself. This isn't a bad idea by itself but it's still not super clean as it will block the main thread of your API server until thejob_feeds_update
is complete, which might not be ideal if you have other routes that users need.You could combine this with the next suggestion:
Check out the second example on the Parallel Execution page of schedule's docs. It shows you how to use a jobqueue and threads to offload jobs.
Because you run
schedule.run_pending()
, your main thread in your server is blocked until the jobs run. By using threads (+ the job queue), you can keep scheduling jobs in the queue + avoid blocking the main server with your jobs. This should optimize things a little bit further for you by letting jobs continue to be scheduled.Use
ischedule
instead as it takes into account the job execution time and provides precise schedules: https://pypi.org/project/ischedule/. This might be the simplest solution for you in case 1+2 end up being a headache!Don't use schedule and simply have your cron jobs hit a route that just runs the actual function (so basically counter to the advice of using 1+2 above). Problem with this is that if your functions take longer than a minute to run for feed updates, you may have multiple overlapping cron jobs running at the same time doing feed updates. So I'd recommend not doing this and relying on a mechanism to queue/schedule your requests with threads and jobs. Only mentioning this as a potential scenario of what else you could do.