更快地运行 SQL 查询
SELECT projectID, urlID, COUNT(1) AS totalClicks, projectPage,
(SELECT COUNT(1)
FROM tblStatSessionRoutes, tblStatSessions
WHERE tblStatSessionRoutes.statSessionID = tblStatSessions.ID
AND tblStatSessions.projectID = tblAdClicks.projectID
AND (tblStatSessionRoutes.leftPageID = tblAdClicks.projectPage OR
tblStatSessionRoutes.rightPageID = tblAdClicks.projectPage)) AS totalViews
FROM tblAdClicks
WHERE projectID IN (SELECT projectID FROM tblProjects WHERE userID = 5)
GROUP BY projectID, urlID, projectPage
ORDER BY CASE projectID
WHEN 170 THEN
1
ELSE
0
END, projectID
这绝不是一个特别复杂的查询,但由于数据库已规范化到良好的水平,并且我们正在处理大量数据,因此该查询对于用户来说可能非常慢。
有人有关于如何提高速度的建议吗?如果我策略性地对数据库的某些部分进行非规范化,这会有帮助吗?在存储过程中运行它会带来显着的改进吗?
我处理数据的方式在我的代码中是有效的,瓶颈确实在于这个查询。
谢谢!
SELECT projectID, urlID, COUNT(1) AS totalClicks, projectPage,
(SELECT COUNT(1)
FROM tblStatSessionRoutes, tblStatSessions
WHERE tblStatSessionRoutes.statSessionID = tblStatSessions.ID
AND tblStatSessions.projectID = tblAdClicks.projectID
AND (tblStatSessionRoutes.leftPageID = tblAdClicks.projectPage OR
tblStatSessionRoutes.rightPageID = tblAdClicks.projectPage)) AS totalViews
FROM tblAdClicks
WHERE projectID IN (SELECT projectID FROM tblProjects WHERE userID = 5)
GROUP BY projectID, urlID, projectPage
ORDER BY CASE projectID
WHEN 170 THEN
1
ELSE
0
END, projectID
This is by no means an especially complex query, but because the database is normalised to a good level, and we are dealing with a significant amount of data, this query can be quite slow for the user.
Does anyone have tips on how to improve the speed of it? If I strategically denormalise parts of the database would this help? Will running it in a stored proc offer significant improvements?
The way I handle the data is efficient in my code, the bottleneck really is with this query.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
对数据库进行非规范化应该是最后的手段,因为(仅选择一个原因)您不想鼓励非规范化所允许的数据不一致。
首先是看看能否从查询执行计划中得到一些线索。例如,可能是您的子选择成本过高,最好先将其放入临时表中,然后将其加入到主查询中。
此外,如果您看到大量表扫描,您可以从改进的索引中受益。
如果您还没有这样做,您应该花几分钟重新格式化您的查询以提高可读性。令人惊奇的是,在执行此操作时,明显的优化经常会突然出现在您面前。
De-normalising your database should be a last resort since (to choose just one reason) you don't want to encourage data inconsistencies which de-normalisation will allow.
First thing is to see if you can get some clues from the query execution plan. It could be, for example, that your sub-selects are costing too much, and would be better done first into temp tables which you then JOIN in your main query.
Also, if you see lots of table-scans, you could benefit from improved indexes.
If you haven't already, you should spend a few minutes re-formatting your query for readability. It's amazing how often the obvious optimisation will jump out at you while doing this.
我会尝试打破它
并使用 JOIN 代替:
不确定这会有多大帮助 - 我希望应该有所帮助!
除此之外,我会检查您是否在相关列上有索引,例如在
a.ProjectID
(以帮助 JOIN)上,也许在a.urlID
上以及a.ProjectPage
(帮助使用GROUP BY
)I would try to break up that
and use a JOIN instead:
Not sure just how much this will help - should help a bit, I hope!
Other than that, I would check if you have indices on the relevant columns, e.g. on
a.ProjectID
(to help with the JOIN), and maybe ona.urlID
anda.ProjectPage
(to help with theGROUP BY
)如果您的 dbms 有一个可以解释其查询计划的工具,请首先使用该工具。 (您的第一个相关子查询可能每行运行一次。)然后确保 WHERE 子句中引用的每个列都有一个索引。
这个子查询——WHERE projectID IN (SELECT projectID FROM tblProjects WHERE userID = 5)——肯定可以从被剪切并实现为视图中受益。然后加入到视图中。
将点击流数据视为数据仓库应用程序并不罕见。如果您需要走这条路,我通常会实现一个单独的数据仓库,而不是对设计良好的 OLTP 数据库进行非规范化。
我怀疑将其作为存储过程运行会对您有所帮助。
If your dbms has a tool that explains its query plan, use that first. (Your first correlated subquery might be running once per row.) Than make sure every column referenced in a WHERE clause has an index.
This subquery--WHERE projectID IN (SELECT projectID FROM tblProjects WHERE userID = 5)--can surely benefit from being cut and implemented as a view. Then join to the view.
It's not unusual to treat clickstream data as a data warehouse application. If you need to go that route, I'd usually implement a separate data warehouse rather than denormalize a well-designed OLTP database.
I doubt that running it as a stored proc will help you.
我会尝试删除相关子查询(内部
(SELECT COUNT(1) ...)
)。必须加入左侧页面或右侧页面匹配的会话路由会使事情变得有点棘手。大致如下(但我还没有测试过这一点):如果我要添加一些缓存表来帮助实现这一点,正如我所指出的,我会尝试将左页和右页对 tblStatSessionRoutes 的两个查询减少到单个查询询问。例如,如果您知道 leftPageID 永远不会等于 rightPageID,则应该可以简单地使用触发器来填充另一个表,其中左视图和右视图位于不同的行中。
I would try to remove the correlated subquery (the inner
(SELECT COUNT(1) ...)
). Having to join against your session routes where either the left page or the right page matches makes things a bit tricky. Something along the lines of (but I haven't tested this):If I were to add some cache tables to help this, as I indicated I'd try to reduce the two queries against tblStatSessionRoutes for both left and right page to a single query. If you know that leftPageID will never be equal to rightPageID, it should be possible to simply use a trigger to populate an additional table with the left and right views in separate rows, for example.