关于Django在另一个请求(3sec)中处理长期请求(50秒)的问题。
我有一个发布请求将返回用户的一些信息,在此请求中,将在同一应用中调用另一个API,它将查询数据库并生成PDF报告,然后上传到S3,将花费约50秒。
我如何让首先请求返回输入给用户并生成在后台运行的PDF API?
我做了一些研究,发现芹菜可以处理这项任务,这是否建议?还是有人有建议?
提前致谢!!!
Issue about django handling a long request(50sec) in another request(3sec).
I have a POST request will return some infomation for user, in this request will call another api in the same app and it will query database and generate pdf report then upload to s3, it will cost about 50sec.
How can I let first request return infomation to user and generate pdf api run in background?
I have done some research, found Celery may be can handle this task, is this recommend? or anyone have advice?
Thanks in advance!!!
发布评论
评论(1)
是的,这是您带来的解决方案,例如 celery , rq 或 huey huey a>。
在后端,您将使用
在上面的3个中,我强烈建议芹菜。它的时间更长,并且在Sentry和Scout APM等服务上拥有更好的遥测。
要开始,这是 django 在芹菜文档网站上及其 github上的示例django project 。
进入工作心态
数据已序列化
这意味着要传输数据内容将为腌制 或编码在 json
好:
book_id
作为str
,然后查找book.objects.get(pk = book_id)
在计划的内部功能。即使这意味着要进行冗余查询 - 至少是您可以依靠的新数据。危险:在作业参数中传递模型的实例(例如
book> book 模型)。该任务可能只是错误,因为它不可序列化。即使 序列化,您的数据可能会过时或在功能运行时过时。
保存任务ID,因此您可以查找工作状态:如果已经进行了工作,则可以固定。
Yes, this is where you'd bring in a solution like celery, rq, or huey.
In the backend you will use a server like redis which stores the state of the jobs you scheduled (and if they errored).
Of the 3 above, I highly recommend celery. It's been around longer and has better telemetry on services like sentry and Scout APM.
To get started, here's a link for First steps with Django on celery documentation website and its sample django project on GitHub.
Getting into the job mindset
Data is serialized
This means that to transport data the content will be pickled or encoded in json
Pass object IDs / naïve data to scheduled functions
Good:
book_id
as astr
, then look upBook.objects.get(pk=book_id)
inside the scheduled function. Even if it means making a redundant query - it's at least fresh data you can rely on.Dangerous: Passing an instance of a model (e.g.
book
ofBook
model) in job params. The task may simply error due to it not being serializable. Even if serialized, your data may be outdated or stale by the time the function runs.Save the task IDs so you can look up the state of the job: this makes it possible to pin down if a job is underway already.