使用 mongomapper 的延迟作业很慢
我正在将delayed_jobs 与mongomapper 一起使用。然而,获取delayed_jobs记录(大约500k条记录)时速度很慢。
我正在运行创建索引{locked_by: -1,priority:1,run_at:1}
,但这没有帮助。
我实在不知道该用哪些索引来改进查询。每次抓取大约需要 2 秒。
以下是 mongodb 日志:
Tue Dec 13 09:52:38 [conn497] query api_Production.$cmd ntreturn:1 command: { findandmodify:“delayed_jobs”,查询:{ run_at:{ $lte:新日期(1323769957289)},failed_at: 空,$或:[{locked_by:“主机:ip-10-128-145-246 pid:26157”},{locked_at:null},{ locked_at: { $lt: new Date(1323769057289) } } ] }, 排序: {locked_by: -1, 优先级: -1, run_at:1},更新:{$ set:{locked_at:新日期(1323769957289),locked_by:“主机:ip-10- 128-145-246 pid:26157" } } } reslen:699 1486ms
I'm using delayed_jobs with mongomapper. However, it's slow when fetching delayed_jobs records (around 500k records).
I'm running to create indexes { locked_by: -1, priority: 1, run_at: 1 }
, but it doesn't help.
I really don't know which indexes to improve the query. Each fetching takes around 2 seconds.
Here is the mongodb log:
Tue Dec 13 09:52:38 [conn497] query api_production.$cmd ntoreturn:1 command: {
findandmodify: "delayed_jobs", query: { run_at: { $lte: new Date(1323769957289) }, failed_at:
null, $or: [ { locked_by: "host:ip-10-128-145-246 pid:26157" }, { locked_at: null }, {
locked_at: { $lt: new Date(1323769057289) } } ] }, sort: { locked_by: -1, priority: -1,
run_at: 1 }, update: { $set: { locked_at: new Date(1323769957289), locked_by: "host:ip-10-
128-145-246 pid:26157" } } } reslen:699 1486ms
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的索引与查询不匹配。您的查询首先消除基于
run_at
的候选者,因此这应该是您的第一个索引,但事实并非如此。然后是一个相当不优雅的
$or
子句。现在很难选择一个合适的索引,因为两个条件是locked_at
,一个是locked_by
。更糟糕的是,存在三个排序标准,但它们与查询约束的方向完全相反。另外,您正在对相当长的字符串进行排序。
基本上,我认为查询设计得不是很好,它试图在单个查询中完成太多任务。我不知道delayed_jobs是否是某种模块,但如果规则更简单的话会容易得多。例如,为什么一个工人会锁定这么多工作?事实上,我认为最好只锁定当前正在处理的作业,并让不同的工作人员获取不同的作业类型以进行扩展。工作人员可能希望使用 uuid,而不是使用他们的 ip 地址和 pid(带有不增加熵和选择性的前缀)等。
Your indexes don't match the query. Your query first eliminates candidates based on
run_at
, so that should be your first index, but it's not.Then comes a rather inelegant
$or
clause. Now it will be hard to choose an appropriate index, because two criteria arelocked_at
while one islocked_by
.To make matters worse, there are three sort criteria, but they are exactly reverse of the direction of the query constraints. Also, you're sorting on a rather lengthy string.
Basically, I think the query is not very well designed, it tries to accomplish too much in a single query. I don't know if
delayed_jobs
is some kind of module, but it would be much easier if the rules were simpler. Why does a worker lock so many jobs, for instance? In fact, I think it's best if you only lock the job you're currently working on and have different workers fetch different job types for scaling. The workers might want to use uuids instead of using their ip address and pid (with a prefix that adds no entropy and no selectivity), etc.