SQL 查询从日志表计算访问持续时间
我有一个 MySQL 表 LOGIN_LOG ,其中包含字段 ID、PLAYER、TIMESTAMP 和 ACTION。 ACTION 可以是“登录”或“注销”。 只有大约 20% 的登录有伴随的注销行。 对于那些这样做的人,我想计算平均持续时间。
我在想
select avg(LL2.TIMESTAMP - LL1.TIMESTAMP)
from LOGIN_LOG LL1
inner join LOGIN_LOG LL2 on LL1.PLAYER = LL2.PLAYER and LL2.TIMESTAMP > LL1.TIMESTAMP
left join LOGIN_LOG LL3 on LL3.PLAYER = LL1.PLAYER
and LL3.TIMESTAMP between LL1.TIMESTAMP + 1 and LL2.TIMESTAMP - 1
and LL3.ACTION = 'login'
where LL1.ACTION = 'login' and LL2.ACTION = 'logout' and isnull(LL3.ID)
这是最好的方法吗,还是有更有效的方法?
I have a MySQL table LOGIN_LOG with fields ID, PLAYER, TIMESTAMP and ACTION. ACTION can be either 'login' or 'logout'. Only around 20% of the logins have an accompanying logout row. For those that do, I want to calculate the average duration.
I'm thinking of something like
select avg(LL2.TIMESTAMP - LL1.TIMESTAMP)
from LOGIN_LOG LL1
inner join LOGIN_LOG LL2 on LL1.PLAYER = LL2.PLAYER and LL2.TIMESTAMP > LL1.TIMESTAMP
left join LOGIN_LOG LL3 on LL3.PLAYER = LL1.PLAYER
and LL3.TIMESTAMP between LL1.TIMESTAMP + 1 and LL2.TIMESTAMP - 1
and LL3.ACTION = 'login'
where LL1.ACTION = 'login' and LL2.ACTION = 'logout' and isnull(LL3.ID)
is this the best way to do it, or is there one more efficient?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
考虑到您拥有的数据,您可能无法做得更快,因为您必须查看登录和注销记录,并确保两者之间没有同一用户的其他登录(或注销?)记录。
或者,找到一种方法来确保断开连接记录注销,以便数据完整(而不是完成 20%)。 但是,查询可能仍然需要确保满足所有条件,因此它不会对查询有太大帮助。
如果您可以将数据转换为登录时间和相应的注销时间都在同一记录中的格式,那么您可以极大地简化查询。 我不清楚 SessionManager 是否会为你做到这一点。
Given the data you have, there probably isn't anything much faster you can do because you have to look at a LOGIN and a LOGOUT record, and ensure there is no other LOGIN (or LOGOUT?) record for the same user between the two.
Alternatively, find a way to ensure that a disconnect records a logout, so that the data is complete (instead of 20% complete). However, the query probably still has to ensure that the criteria are all met, so it won't help the query all that much.
If you can get the data into a format where the LOGIN and corresponding LOGOUT times are both in the same record, then you can simplify the query immensely. I'm not clear if the SessionManager does that for you.
您是否有可以使会话超时的 SessionManager 类型对象? 因为可以在那里记录超时,并且您可以从中获取上次活动时间和超时时间。
或者您记录网站/服务上的所有活动,因此您可以直接查询网站/服务的访问持续时间,并查看他们执行了哪些活动。 对于网站,Apache 日志分析器可能可以生成所需的统计信息。
Do you have a SessionManager type object that can timeout sessions? Because a timeout could be logged there, and you could get the last activity time from that and the timeout period.
Or you log all activity on the website/service, and thus you can query website/service visit duration directly, and see what activities they performed. For a website, Apache log analysers can probably generate the required stats.
我同意 JeeBee 的观点,但 SessionManager 类型对象的另一个优点是您可以处理 sessionEnd 事件并编写一个包含活动时间的注销行。 这样,您可能会从 20% 伴随注销行变为 100% 伴随注销行。 这样,查询活动时间就变得很简单,并且对于所有会话来说都是一致的。
I agree with JeeBee, but another advantage to a SessionManager type object is that you can handle the sessionEnd event and write a logout row with the active time in it. This way you would likely go from 20% accompanying logout rows to 100% accompanying logout rows. Querying for the activity time would then be trivial and consistent for all sessions.
如果只有 20% 的用户实际注销,则此搜索不会为您提供每次会话的非常准确的时间。 衡量平均用户会话时长的更好方法是获取操作之间的平均时间,即 avg。 每页的时间。 然后,可以将其乘以每次访问的平均页面/操作数,以给出更准确的时间。
此外,您还可以确定平均值。 每个页面的时间,然后获取会话结束时间 = 到该点的会话时间 + 在最后一页上花费的平均时间。 这将为您提供更细粒度(且更准确)的每次会话所花费的时间测量。
关于给定的 SQL,它似乎比您真正需要的更复杂。 这种统计操作通常可以在数据库外部的代码中得到更好的处理/更易于维护,您可以充分利用您选择的语言的全部功能,而不仅仅是用于统计计算的 SQL 的相当复杂的功能
If only 20% of your users actually log out, this search will not give you a very accurate time of each session. A better way to gauge how long an average user session is would be to take the average time between actions, or avg. time per page. This, then, can multiplied by the average number of pages/actions per visit to give a more accurate time.
Additionally, you can determine avg. time for each page, and then get your session end time = session time to that point + avg time spent on their last page. This will give you a much more fine-grained(and accurate) measure of time spent per session.
Regarding the given SQL, it seems to be more complicated than you really need. This sort of statistical operation can often be better handled/more maintainable in code external to the database where you can have the full power of whichever language you choose, and not just the rather convoluted abilities of SQL for statistical calculations