如何通过连接正确计算 SUM?
因此,我试图计算零件数量、任务数量、每项工作的数量以及制造每项工作所需的时间,但我得到了一些奇怪的结果。如果我运行这个:
SELECT
j.id,
mf.special_instructions,
count(distinct p.id) as number_of_different_parts,
count(distinct t.id) as number_of_tasks,
SUM(distinct j.quantity) as number_of_assemblies,
SUM(l.time_elapsed) as time_elapsed
FROM
sugarcrm2.mf_job mf
INNER JOIN ramses.jobs j on
mf.id = j.mf_job_id
INNER JOIN ramses.parts p on
j.id = p.job_id
INNER JOIN ramses.tasks t on
p.id = t.part_id
INNER JOIN ramses.batch_log l on
t.batch_id = l.batch_id
WHERE
mf.job_description LIKE "%BACKBLAZE%" OR
mf.customer_name LIKE "%BACKBLAZE%" OR
mf.customer_ref LIKE "%BACKBLAZE%" OR
mf.technical_company_name LIKE "%BACKBLAZE%" OR
mf.description LIKE "%BACKBLAZE%" OR
mf.name LIKE "%BACKBLAZE%" OR
mf.enclosure_style LIKE "%BACKBLAZE%" OR
mf.special_instructions LIKE "%BACKBLAZE%"
Group by j.id
我现在得到准确的零件和任务编号,但 time_elapsed 总和不正确。问题可能是什么?
当我尝试使用 distinct
时,我得到了一个非常低的数字(就像当我寻找接近 10,000 的数字时,数字在 1 到 30 之间。)
更新:这是创建代码:
http://pastebin.com/Zq0bKG2L< /a>
关系如下:
- mf_job 信息链接到一个作业
- 作业有零件
- 零件有任务
- 任务
- 是批量的。批量任务的启动和停止,它有一个 start_time 和一个 stop_time 以及一个 time_elapsed 。
我试图从每个 mf_job 的 batch_log 中获取所有 time_elapsed ,其中一个字段中包含单词 backblaze 以及零件、任务和组件的数量。这一切都需要按 job.id 或 mf_job.id 分组
So I'm trying to count the number of parts, number of tasks, the quantity in each job and the time that it took to manufacture each job but I'm getting some funky results. If I run this:
SELECT
j.id,
mf.special_instructions,
count(distinct p.id) as number_of_different_parts,
count(distinct t.id) as number_of_tasks,
SUM(distinct j.quantity) as number_of_assemblies,
SUM(l.time_elapsed) as time_elapsed
FROM
sugarcrm2.mf_job mf
INNER JOIN ramses.jobs j on
mf.id = j.mf_job_id
INNER JOIN ramses.parts p on
j.id = p.job_id
INNER JOIN ramses.tasks t on
p.id = t.part_id
INNER JOIN ramses.batch_log l on
t.batch_id = l.batch_id
WHERE
mf.job_description LIKE "%BACKBLAZE%" OR
mf.customer_name LIKE "%BACKBLAZE%" OR
mf.customer_ref LIKE "%BACKBLAZE%" OR
mf.technical_company_name LIKE "%BACKBLAZE%" OR
mf.description LIKE "%BACKBLAZE%" OR
mf.name LIKE "%BACKBLAZE%" OR
mf.enclosure_style LIKE "%BACKBLAZE%" OR
mf.special_instructions LIKE "%BACKBLAZE%"
Group by j.id
and I now get accurate parts and tasks numbers but the time_elapsed sum isn't correct. What could the problem be?
When I try it with distinct
I get a veeeeery low number (like something between 1 and 30 when I'm looking for something closer to 10,000.)
UPDATE: here is the create code:
The relationships are like this:
- mf_job info is linked to a job
- jobs have parts
- parts have tasks
- tasks are in batches
- batch_log is a table with all of the starts and stops for the batches of tasks, it has a start_time and a stop_time and a time_elapsed.
I am trying to get all of the time_elapsed from the batch_log for each mf_job with the word backblaze in one of it's fields along with the number of parts, tasks and assemblies. This all needs to be grouped by job.id or mf_job.id
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
尝试将查询重写为:
Try and rewrite the query to:
您需要将查询更改为:
另外,
LIKE '%...'
行将使查询变得非常慢,因为无法使用此查询。如果您能够使用 MyISAM,则可以在这些列上使用全文索引并使用如下代码:
请参阅:
http://dev.mysql.com/doc/refman/5.5 /en/fulltext-search.html
http://www.petefreitag.com/item/477.cfm
http://dev .mysql.com/doc/refman/5.0/en/date-and-time-functions.html#function_time-to-sec
You need to change the query to:
Also, the line of
LIKE '%...'
will make the query uber slow, because no indexes on this can be used.If you are able to use MyISAM, you can use a fulltext index on those columns and use code like:
See:
http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html
http://www.petefreitag.com/item/477.cfm
http://dev.mysql.com/doc/refman/5.0/en/date-and-time-functions.html#function_time-to-sec
听起来问题在于多个任务可以在同一批次中,和/或多个部分可以在同一任务中。举例来说,您的工作有 3 个部分,每个部分都有一个任务,并且所有 3 个任务都在同一个批次中。您需要将该批次的时间添加三次。但不同的也不起作用,因为如果您有 5 个不同的批次,并且全部花费了 300 秒,那么它们将不会被视为不同。
在这种情况下,通常可以使用子查询。您可以使用选择不同
j.id
(或p.job_id
)、的子查询来连接,而不是直接使用
和batch_log
来连接l.batch_idl.time_elapsed
(第一个用于连接,第二个用于正确计算不同的值,第三个是要使用的实际值)。然后您可以从那里求和l.time_elapsed
。这样每个批次都会被精确计数一次。It sounds like the problem is that multiple tasks can be in the same batch, and/or multiple parts can be in the same task. Say for example that your job has 3 parts, and each part has a task, and all 3 tasks are in the same batch. You'll add the time for that batch three times. But distinct won't work either, since if you have 5 different batches that all took 300 seconds they won't be considered distinct.
In a situation like this, a subquery is usually the way to go. Instead of joining directly with
batch_log
, you would join with a subquery that selected distinctj.id
(orp.job_id
),l.batch_id
, andl.time_elapsed
(the first is for joining, the second just for proper calculation of distinct, and the third the actual value to use). Then you can suml.time_elapsed
from there. This way each batch is counted exactly once.batches(l) 表没有名为 time_elapsed 的字段
任务这样做,所以它是
- 或 -
batches(l) table doesn't have a field called time_elapsed
tasks does so it's either
-or-