SQL 确定最小连续访问天数?

发布于 2024-07-28 23:20:06 字数 649 浏览 13 评论 0原文

以下用户历史记录表包含给定用户访问网站的每一天的一条记录(在 24 小时 UTC 时间段内)。 它有数千条记录,但每个用户每天只有一条记录。 如果用户当天没有访问该网站,则不会生成任何记录。

Id      UserId   CreationDate
------  ------   ------------
750997      12   2009-07-07 18:42:20.723
750998      15   2009-07-07 18:42:20.927
751000      19   2009-07-07 18:42:22.283

我正在寻找的是对该表的 SQL 查询具有良好的性能,它告诉我哪些用户 ID 连续 (n) 天访问该网站而没有错过一天。

换句话说,有多少用户在此表中拥有 (n) 条具有连续日期(前天或后天)日期的记录? 如果序列中缺少任何一天,序列就会被破坏,并应从 1 重新开始; 我们正在寻找在此处连续停留天数且没有间断的用户。

当然,此查询与特定 Stack Overflow 徽章之间的任何相似之处纯属巧合。.:)

The following User History table contains one record for every day a given user has accessed a website (in a 24 hour UTC period). It has many thousands of records, but only one record per day per user. If the user has not accessed the website for that day, no record will be generated.

Id      UserId   CreationDate
------  ------   ------------
750997      12   2009-07-07 18:42:20.723
750998      15   2009-07-07 18:42:20.927
751000      19   2009-07-07 18:42:22.283

What I'm looking for is a SQL query on this table with good performance, that tells me which userids have accessed the website for (n) continuous days without missing a day.

In other words, how many users have (n) records in this table with sequential (day-before, or day-after) dates? If any day is missing from the sequence, the sequence is broken and should restart again at 1; we're looking for users who have achieved a continuous number of days here with no gaps.

Any resemblance between this query and a particular Stack Overflow badge is purely coincidental, of course.. :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(19

凤舞天涯 2024-08-04 23:20:06

怎么样(请确保前面的语句以分号结尾):

WITH numberedrows
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY UserID 
                                       ORDER BY CreationDate)
                - DATEDIFF(day,'19000101',CreationDate) AS TheOffset,
                CreationDate,
                UserID
         FROM   tablename)
SELECT MIN(CreationDate),
       MAX(CreationDate),
       COUNT(*) AS NumConsecutiveDays,
       UserID
FROM   numberedrows
GROUP  BY UserID,
          TheOffset  

这个想法是,如果我们有天数列表(作为数字)和 row_number,那么错过的天数会使这两个列表之间的偏移量稍微增加大。 所以我们正在寻找一个具有一致偏移的范围。

你可以在最后使用“ORDER BY NumConsecutiveDays DESC”,或者说“HAVING count(*) > 14”作为阈值......

我还没有测试过这个——只是把它写在我的脑海里。 希望能在 SQL2005 及更高版本中工作。

...并且对表名(UserID,CreationDate)上的索引会有很大帮助

编辑:原来Offset是一个保留字,所以我使用了TheOffset。

编辑:使用 COUNT(*) 的建议非常有效 - 我应该首先这样做,但并没有真正考虑。 以前它使用 datediff(day, min(CreationDate), max(CreationDate)) 代替。

How about (and please make sure the previous statement ended with a semi-colon):

WITH numberedrows
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY UserID 
                                       ORDER BY CreationDate)
                - DATEDIFF(day,'19000101',CreationDate) AS TheOffset,
                CreationDate,
                UserID
         FROM   tablename)
SELECT MIN(CreationDate),
       MAX(CreationDate),
       COUNT(*) AS NumConsecutiveDays,
       UserID
FROM   numberedrows
GROUP  BY UserID,
          TheOffset  

The idea being that if we have list of the days (as a number), and a row_number, then missed days make the offset between these two lists slightly bigger. So we're looking for a range that has a consistent offset.

You could use "ORDER BY NumConsecutiveDays DESC" at the end of this, or say "HAVING count(*) > 14" for a threshold...

I haven't tested this though - just writing it off the top of my head. Hopefully works in SQL2005 and on.

...and would be very much helped by an index on tablename(UserID, CreationDate)

Edited: Turns out Offset is a reserved word, so I used TheOffset instead.

Edited: The suggestion to use COUNT(*) is very valid - I should've done that in the first place but wasn't really thinking. Previously it was using datediff(day, min(CreationDate), max(CreationDate)) instead.

Rob

手长情犹 2024-08-04 23:20:06

答案显然是:

SELECT DISTINCT UserId
FROM UserHistory uh1
WHERE (
       SELECT COUNT(*) 
       FROM UserHistory uh2 
       WHERE uh2.CreationDate 
       BETWEEN uh1.CreationDate AND DATEADD(d, @days, uh1.CreationDate)
      ) = @days OR UserId = 52551

编辑:

好吧,这是我认真的答案:

DECLARE @days int
DECLARE @seconds bigint
SET @days = 30
SET @seconds = (@days * 24 * 60 * 60) - 1
SELECT DISTINCT UserId
FROM (
    SELECT uh1.UserId, Count(uh1.Id) as Conseq
    FROM UserHistory uh1
    INNER JOIN UserHistory uh2 ON uh2.CreationDate 
        BETWEEN uh1.CreationDate AND 
            DATEADD(s, @seconds, DATEADD(dd, DATEDIFF(dd, 0, uh1.CreationDate), 0))
        AND uh1.UserId = uh2.UserId
    GROUP BY uh1.Id, uh1.UserId
    ) as Tbl
WHERE Conseq >= @days

编辑:

[Jeff Atwood]这是一个很棒的快速解决方案,值得被接受,但是Rob Farley 的解决方案也非常出色,而且可以说更快(! )。 请您也检查一下!

The answer is obviously:

SELECT DISTINCT UserId
FROM UserHistory uh1
WHERE (
       SELECT COUNT(*) 
       FROM UserHistory uh2 
       WHERE uh2.CreationDate 
       BETWEEN uh1.CreationDate AND DATEADD(d, @days, uh1.CreationDate)
      ) = @days OR UserId = 52551

EDIT:

Okay here's my serious answer:

DECLARE @days int
DECLARE @seconds bigint
SET @days = 30
SET @seconds = (@days * 24 * 60 * 60) - 1
SELECT DISTINCT UserId
FROM (
    SELECT uh1.UserId, Count(uh1.Id) as Conseq
    FROM UserHistory uh1
    INNER JOIN UserHistory uh2 ON uh2.CreationDate 
        BETWEEN uh1.CreationDate AND 
            DATEADD(s, @seconds, DATEADD(dd, DATEDIFF(dd, 0, uh1.CreationDate), 0))
        AND uh1.UserId = uh2.UserId
    GROUP BY uh1.Id, uh1.UserId
    ) as Tbl
WHERE Conseq >= @days

EDIT:

[Jeff Atwood] This is a great fast solution and deserves to be accepted, but Rob Farley's solution is also excellent and arguably even faster (!). Please check it out too!

如果您可以更改表架构,我建议向表中添加一列 LongestStreak,并将其设置为以 CreationDate 结尾的连续天数。 在登录时更新表很容易(类似于您已经在做的事情,如果当天不存在任何行,您将检查前一天是否存在任何行。如果为 true,您将增加 LongestStreak 在新行中,否则,您将其设置为 1。)

添加此列后,查询将显而易见:

if exists(select * from table
          where LongestStreak >= 30 and UserId = @UserId)
   -- award the Woot badge.

If you can change the table schema, I'd suggest adding a column LongestStreak to the table which you'd set to the number of sequential days ending to the CreationDate. It's easy to update the table at login time (similar to what you are doing already, if no rows exist of the current day, you'll check if any row exists for the previous day. If true, you'll increment the LongestStreak in the new row, otherwise, you'll set it to 1.)

The query will be obvious after adding this column:

if exists(select * from table
          where LongestStreak >= 30 and UserId = @UserId)
   -- award the Woot badge.
°如果伤别离去 2024-08-04 23:20:06

一些很好表达的 SQL 代码如下:

select
        userId,
    dbo.MaxConsecutiveDates(CreationDate) as blah
from
    dbo.Logins
group by
    userId

假设您有一个 用户定义的聚合函数< /a> 类似的内容(注意这是错误的):

using System;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Runtime.InteropServices;

namespace SqlServerProject1
{
    [StructLayout(LayoutKind.Sequential)]
    [Serializable]
    internal struct MaxConsecutiveState
    {
        public int CurrentSequentialDays;
        public int MaxSequentialDays;
        public SqlDateTime LastDate;
    }

    [Serializable]
    [SqlUserDefinedAggregate(
        Format.Native,
        IsInvariantToNulls = true, //optimizer property
        IsInvariantToDuplicates = false, //optimizer property
        IsInvariantToOrder = false) //optimizer property
    ]
    [StructLayout(LayoutKind.Sequential)]
    public class MaxConsecutiveDates
    {
        /// <summary>
        /// The variable that holds the intermediate result of the concatenation
        /// </summary>
        private MaxConsecutiveState _intermediateResult;

        /// <summary>
        /// Initialize the internal data structures
        /// </summary>
        public void Init()
        {
            _intermediateResult = new MaxConsecutiveState { LastDate = SqlDateTime.MinValue, CurrentSequentialDays = 0, MaxSequentialDays = 0 };
        }

        /// <summary>
        /// Accumulate the next value, not if the value is null
        /// </summary>
        /// <param name="value"></param>
        public void Accumulate(SqlDateTime value)
        {
            if (value.IsNull)
            {
                return;
            }
            int sequentialDays = _intermediateResult.CurrentSequentialDays;
            int maxSequentialDays = _intermediateResult.MaxSequentialDays;
            DateTime currentDate = value.Value.Date;
            if (currentDate.AddDays(-1).Equals(new DateTime(_intermediateResult.LastDate.TimeTicks)))
                sequentialDays++;
            else
            {
                maxSequentialDays = Math.Max(sequentialDays, maxSequentialDays);
                sequentialDays = 1;
            }
            _intermediateResult = new MaxConsecutiveState
                                      {
                                          CurrentSequentialDays = sequentialDays,
                                          LastDate = currentDate,
                                          MaxSequentialDays = maxSequentialDays
                                      };
        }

        /// <summary>
        /// Merge the partially computed aggregate with this aggregate.
        /// </summary>
        /// <param name="other"></param>
        public void Merge(MaxConsecutiveDates other)
        {
            // add stuff for two separate calculations
        }

        /// <summary>
        /// Called at the end of aggregation, to return the results of the aggregation.
        /// </summary>
        /// <returns></returns>
        public SqlInt32 Terminate()
        {
            int max = Math.Max((int) ((sbyte) _intermediateResult.CurrentSequentialDays), (sbyte) _intermediateResult.MaxSequentialDays);
            return new SqlInt32(max);
        }
    }
}

Some nicely expressive SQL along the lines of:

select
        userId,
    dbo.MaxConsecutiveDates(CreationDate) as blah
from
    dbo.Logins
group by
    userId

Assuming you have a user defined aggregate function something along the lines of (beware this is buggy):

using System;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Runtime.InteropServices;

namespace SqlServerProject1
{
    [StructLayout(LayoutKind.Sequential)]
    [Serializable]
    internal struct MaxConsecutiveState
    {
        public int CurrentSequentialDays;
        public int MaxSequentialDays;
        public SqlDateTime LastDate;
    }

    [Serializable]
    [SqlUserDefinedAggregate(
        Format.Native,
        IsInvariantToNulls = true, //optimizer property
        IsInvariantToDuplicates = false, //optimizer property
        IsInvariantToOrder = false) //optimizer property
    ]
    [StructLayout(LayoutKind.Sequential)]
    public class MaxConsecutiveDates
    {
        /// <summary>
        /// The variable that holds the intermediate result of the concatenation
        /// </summary>
        private MaxConsecutiveState _intermediateResult;

        /// <summary>
        /// Initialize the internal data structures
        /// </summary>
        public void Init()
        {
            _intermediateResult = new MaxConsecutiveState { LastDate = SqlDateTime.MinValue, CurrentSequentialDays = 0, MaxSequentialDays = 0 };
        }

        /// <summary>
        /// Accumulate the next value, not if the value is null
        /// </summary>
        /// <param name="value"></param>
        public void Accumulate(SqlDateTime value)
        {
            if (value.IsNull)
            {
                return;
            }
            int sequentialDays = _intermediateResult.CurrentSequentialDays;
            int maxSequentialDays = _intermediateResult.MaxSequentialDays;
            DateTime currentDate = value.Value.Date;
            if (currentDate.AddDays(-1).Equals(new DateTime(_intermediateResult.LastDate.TimeTicks)))
                sequentialDays++;
            else
            {
                maxSequentialDays = Math.Max(sequentialDays, maxSequentialDays);
                sequentialDays = 1;
            }
            _intermediateResult = new MaxConsecutiveState
                                      {
                                          CurrentSequentialDays = sequentialDays,
                                          LastDate = currentDate,
                                          MaxSequentialDays = maxSequentialDays
                                      };
        }

        /// <summary>
        /// Merge the partially computed aggregate with this aggregate.
        /// </summary>
        /// <param name="other"></param>
        public void Merge(MaxConsecutiveDates other)
        {
            // add stuff for two separate calculations
        }

        /// <summary>
        /// Called at the end of aggregation, to return the results of the aggregation.
        /// </summary>
        /// <returns></returns>
        public SqlInt32 Terminate()
        {
            int max = Math.Max((int) ((sbyte) _intermediateResult.CurrentSequentialDays), (sbyte) _intermediateResult.MaxSequentialDays);
            return new SqlInt32(max);
        }
    }
}
枯寂 2024-08-04 23:20:06

似乎您可以利用这样一个事实:要连续 n 天需要有 n 行。

所以像这样:

SELECT users.UserId, count(1) as cnt
FROM users
WHERE users.CreationDate > now() - INTERVAL 30 DAY
GROUP BY UserId
HAVING cnt = 30

Seems like you could take advantage of the fact that to be continuous over n days would require there to be n rows.

So something like:

SELECT users.UserId, count(1) as cnt
FROM users
WHERE users.CreationDate > now() - INTERVAL 30 DAY
GROUP BY UserId
HAVING cnt = 30
被你宠の有点坏 2024-08-04 23:20:06

对我来说,使用单个 SQL 查询来完成此操作似乎过于复杂。 让我将这个答案分为两部分。

  1. 到目前为止您应该做什么以及现在应该开始做什么:
    运行每日 cron 作业,检查每个用户今天是否已登录,如果登录则增加计数器,如果未登录则将其设置为 0。

  2. 你现在应该做什么:
    - 将此表导出到不运行您的网站并且暂时不需要的服务器。 ;)
    - 按用户排序,然后按日期排序。
    - 按顺序浏览它,保留计数器......

Doing this with a single SQL query seems overly complicated to me. Let me break this answer down in two parts.

  1. What you should have done until now and should start doing now:
    Run a daily cron job that checks for every user wether he has logged in today and then increments a counter if he has or sets it to 0 if he hasn't.

  2. What you should do now:
    - Export this table to a server that doesn't run your website and won't be needed for a while. ;)
    - Sort it by user, then date.
    - go through it sequentially, keep a counter...
荒人说梦 2024-08-04 23:20:06

如果这对您来说非常重要,请获取此事件并驱动一个表来为您提供此信息。 不需要用那些疯狂的查询来杀死机器。

If this is so important to you, source this event and drive a table to give you this info. No need to kill the machine with all those crazy queries.

毁虫ゝ 2024-08-04 23:20:06

您可以使用递归 CTE (SQL Server 2005+):

WITH recur_date AS (
        SELECT t.userid,
               t.creationDate,
               DATEADD(day, 1, t.created) 'nextDay',
               1 'level' 
          FROM TABLE t
         UNION ALL
        SELECT t.userid,
               t.creationDate,
               DATEADD(day, 1, t.created) 'nextDay',
               rd.level + 1 'level'
          FROM TABLE t
          JOIN recur_date rd on t.creationDate = rd.nextDay AND t.userid = rd.userid)
   SELECT t.*
    FROM recur_date t
   WHERE t.level = @numDays
ORDER BY t.userid

You could use a recursive CTE (SQL Server 2005+):

WITH recur_date AS (
        SELECT t.userid,
               t.creationDate,
               DATEADD(day, 1, t.created) 'nextDay',
               1 'level' 
          FROM TABLE t
         UNION ALL
        SELECT t.userid,
               t.creationDate,
               DATEADD(day, 1, t.created) 'nextDay',
               rd.level + 1 'level'
          FROM TABLE t
          JOIN recur_date rd on t.creationDate = rd.nextDay AND t.userid = rd.userid)
   SELECT t.*
    FROM recur_date t
   WHERE t.level = @numDays
ORDER BY t.userid
你的往事 2024-08-04 23:20:06

Joe Celko 在 SQL for Smarties 中对此有一个完整的章节(称为运行和序列)。 我家里没有那本书,所以当我开始工作时......我实际上会回答这个问题。 (假设历史表名为 dbo.UserHistory 并且天数为 @Days)

另一个线索来自 SQL 团队关于运行的博客

我有另一个想法,但没有方便的 SQL 服务器可以在这里使用,那就是使用带有分区 ROW_NUMBER 的 CTE,例如这:

WITH Runs
AS
  (SELECT UserID
         , CreationDate
         , ROW_NUMBER() OVER(PARTITION BY UserId
                             ORDER BY CreationDate)
           - ROW_NUMBER() OVER(PARTITION BY UserId, NoBreak
                               ORDER BY CreationDate) AS RunNumber
  FROM
     (SELECT UH.UserID
           , UH.CreationDate
           , ISNULL((SELECT TOP 1 1 
              FROM dbo.UserHistory AS Prior 
              WHERE Prior.UserId = UH.UserId 
              AND Prior.CreationDate
                  BETWEEN DATEADD(dd, DATEDIFF(dd, 0, UH.CreationDate), -1)
                  AND DATEADD(dd, DATEDIFF(dd, 0, UH.CreationDate), 0)), 0) AS NoBreak
      FROM dbo.UserHistory AS UH) AS Consecutive
)
SELECT UserID, MIN(CreationDate) AS RunStart, MAX(CreationDate) AS RunEnd
FROM Runs
GROUP BY UserID, RunNumber
HAVING DATEDIFF(dd, MIN(CreationDate), MAX(CreationDate)) >= @Days

上面的内容可能比实际情况困难,但是当你对“跑步”有其他定义而不仅仅是日期时,你会觉得很头疼。

Joe Celko has a complete chapter on this in SQL for Smarties (calling it Runs and Sequences). I don't have that book at home, so when I get to work... I'll actually answer this. (assuming history table is called dbo.UserHistory and the number of days is @Days)

Another lead is from SQL Team's blog on runs

The other idea I've had, but don't have a SQL server handy to work on here is to use a CTE with a partitioned ROW_NUMBER like this:

WITH Runs
AS
  (SELECT UserID
         , CreationDate
         , ROW_NUMBER() OVER(PARTITION BY UserId
                             ORDER BY CreationDate)
           - ROW_NUMBER() OVER(PARTITION BY UserId, NoBreak
                               ORDER BY CreationDate) AS RunNumber
  FROM
     (SELECT UH.UserID
           , UH.CreationDate
           , ISNULL((SELECT TOP 1 1 
              FROM dbo.UserHistory AS Prior 
              WHERE Prior.UserId = UH.UserId 
              AND Prior.CreationDate
                  BETWEEN DATEADD(dd, DATEDIFF(dd, 0, UH.CreationDate), -1)
                  AND DATEADD(dd, DATEDIFF(dd, 0, UH.CreationDate), 0)), 0) AS NoBreak
      FROM dbo.UserHistory AS UH) AS Consecutive
)
SELECT UserID, MIN(CreationDate) AS RunStart, MAX(CreationDate) AS RunEnd
FROM Runs
GROUP BY UserID, RunNumber
HAVING DATEDIFF(dd, MIN(CreationDate), MAX(CreationDate)) >= @Days

The above is likely WAY HARDER than it has to be, but left as an a brain tickle for when you have some other definition of "a run" than just dates.

遗忘曾经 2024-08-04 23:20:06

几个 SQL Server 2012 选项 (下面假设 N=100)。

;WITH T(UserID, NRowsPrevious)
     AS (SELECT UserID,
                DATEDIFF(DAY, 
                        LAG(CreationDate, 100) 
                            OVER 
                                (PARTITION BY UserID 
                                     ORDER BY CreationDate), 
                         CreationDate)
         FROM   UserHistory)
SELECT DISTINCT UserID
FROM   T
WHERE  NRowsPrevious = 100 

尽管使用我的示例数据,以下结果更有效,

;WITH U
         AS (SELECT DISTINCT UserId
             FROM   UserHistory) /*Ideally replace with Users table*/
    SELECT UserId
    FROM   U
           CROSS APPLY (SELECT TOP 1 *
                        FROM   (SELECT 
                                       DATEDIFF(DAY, 
                                                LAG(CreationDate, 100) 
                                                  OVER 
                                                   (ORDER BY CreationDate), 
                                                 CreationDate)
                                FROM   UserHistory UH
                                WHERE  U.UserId = UH.UserID) T(NRowsPrevious)
                        WHERE  NRowsPrevious = 100) O

但两者都依赖于问题中规定的约束,即每个用户每天最多有一条记录。

A couple of SQL Server 2012 options (assuming N=100 below).

;WITH T(UserID, NRowsPrevious)
     AS (SELECT UserID,
                DATEDIFF(DAY, 
                        LAG(CreationDate, 100) 
                            OVER 
                                (PARTITION BY UserID 
                                     ORDER BY CreationDate), 
                         CreationDate)
         FROM   UserHistory)
SELECT DISTINCT UserID
FROM   T
WHERE  NRowsPrevious = 100 

Though with my sample data the following worked out more efficient

;WITH U
         AS (SELECT DISTINCT UserId
             FROM   UserHistory) /*Ideally replace with Users table*/
    SELECT UserId
    FROM   U
           CROSS APPLY (SELECT TOP 1 *
                        FROM   (SELECT 
                                       DATEDIFF(DAY, 
                                                LAG(CreationDate, 100) 
                                                  OVER 
                                                   (ORDER BY CreationDate), 
                                                 CreationDate)
                                FROM   UserHistory UH
                                WHERE  U.UserId = UH.UserID) T(NRowsPrevious)
                        WHERE  NRowsPrevious = 100) O

Both rely on the constraint stated in the question that there is at most one record per day per user.

感情旳空白 2024-08-04 23:20:06

像这样的东西吗?

select distinct userid
from table t1, table t2
where t1.UserId = t2.UserId 
  AND trunc(t1.CreationDate) = trunc(t2.CreationDate) + n
  AND (
    select count(*)
    from table t3
    where t1.UserId  = t3.UserId
      and CreationDate between trunc(t1.CreationDate) and trunc(t1.CreationDate)+n
   ) = n

Something like this?

select distinct userid
from table t1, table t2
where t1.UserId = t2.UserId 
  AND trunc(t1.CreationDate) = trunc(t2.CreationDate) + n
  AND (
    select count(*)
    from table t3
    where t1.UserId  = t3.UserId
      and CreationDate between trunc(t1.CreationDate) and trunc(t1.CreationDate)+n
   ) = n
无声静候 2024-08-04 23:20:06

我使用一个简单的数学属性来识别谁连续访问了该网站。 这个属性是你第一次访问和最后一次访问之间的天差应该等于访问表日志中的记录数。

以下是我在 Oracle DB 中测试的 SQL 脚本(它也应该在其他数据库中工作):

-- show basic understand of the math properties 
  select    ceil(max (creation_date) - min (creation_date))
              max_min_days_diff,
           count ( * ) real_day_count
    from   user_access_log
group by   user_id;


-- select all users that have consecutively accessed the site 
  select   user_id
    from   user_access_log
group by   user_id
  having       ceil(max (creation_date) - min (creation_date))
           / count ( * ) = 1;



-- get the count of all users that have consecutively accessed the site 
  select   count(user_id) user_count
    from   user_access_log
group by   user_id
  having   ceil(max (creation_date) - min (creation_date))
           / count ( * ) = 1;

表准备脚本:

-- create table 
create table user_access_log (id           number, user_id      number, creation_date date);


-- insert seed data 
insert into user_access_log (id, user_id, creation_date)
  values   (1, 12, sysdate);

insert into user_access_log (id, user_id, creation_date)
  values   (2, 12, sysdate + 1);

insert into user_access_log (id, user_id, creation_date)
  values   (3, 12, sysdate + 2);

insert into user_access_log (id, user_id, creation_date)
  values   (4, 16, sysdate);

insert into user_access_log (id, user_id, creation_date)
  values   (5, 16, sysdate + 1);

insert into user_access_log (id, user_id, creation_date)
  values   (6, 16, sysdate + 5);

I used a simple math property to identify who consecutively accessed the site. This property is that you should have the day difference between the first time access and last time equal to number of records in your access table log.

Here are SQL script that I tested in Oracle DB (it should work in other DBs as well):

-- show basic understand of the math properties 
  select    ceil(max (creation_date) - min (creation_date))
              max_min_days_diff,
           count ( * ) real_day_count
    from   user_access_log
group by   user_id;


-- select all users that have consecutively accessed the site 
  select   user_id
    from   user_access_log
group by   user_id
  having       ceil(max (creation_date) - min (creation_date))
           / count ( * ) = 1;



-- get the count of all users that have consecutively accessed the site 
  select   count(user_id) user_count
    from   user_access_log
group by   user_id
  having   ceil(max (creation_date) - min (creation_date))
           / count ( * ) = 1;

Table prep script:

-- create table 
create table user_access_log (id           number, user_id      number, creation_date date);


-- insert seed data 
insert into user_access_log (id, user_id, creation_date)
  values   (1, 12, sysdate);

insert into user_access_log (id, user_id, creation_date)
  values   (2, 12, sysdate + 1);

insert into user_access_log (id, user_id, creation_date)
  values   (3, 12, sysdate + 2);

insert into user_access_log (id, user_id, creation_date)
  values   (4, 16, sysdate);

insert into user_access_log (id, user_id, creation_date)
  values   (5, 16, sysdate + 1);

insert into user_access_log (id, user_id, creation_date)
  values   (6, 16, sysdate + 5);
‘画卷フ 2024-08-04 23:20:06
declare @startdate as datetime, @days as int
set @startdate = cast('11 Jan 2009' as datetime) -- The startdate
set @days = 5 -- The number of consecutive days

SELECT userid
      ,count(1) as [Number of Consecutive Days]
FROM UserHistory
WHERE creationdate >= @startdate
AND creationdate < dateadd(dd, @days, cast(convert(char(11), @startdate, 113)  as datetime))
GROUP BY userid
HAVING count(1) >= @days

语句 cast(convert(char(11), @startdate, 113) as datetime) 删除了日期的时间部分,因此我们从午夜开始。

我还假设 creationdateuserid 列已建立索引。

我刚刚意识到这不会告诉您所有用户及其连续总天数。 但会告诉您哪些用户将从您选择的日期起在设定的天数内访问过。

修改后的解决方案:

declare @days as int
set @days = 30
select t1.userid
from UserHistory t1
where (select count(1) 
       from UserHistory t3 
       where t3.userid = t1.userid
       and t3.creationdate >= DATEADD(dd, DATEDIFF(dd, 0, t1.creationdate), 0) 
       and t3.creationdate < DATEADD(dd, DATEDIFF(dd, 0, t1.creationdate) + @days, 0) 
       group by t3.userid
) >= @days
group by t1.userid

我已经检查过这一点,它将查询所有用户和所有日期。 它基于 Spencer 的第一个(笑话?)解决方案,但我的有效。

更新:改进了第二个解决方案中的日期处理。

declare @startdate as datetime, @days as int
set @startdate = cast('11 Jan 2009' as datetime) -- The startdate
set @days = 5 -- The number of consecutive days

SELECT userid
      ,count(1) as [Number of Consecutive Days]
FROM UserHistory
WHERE creationdate >= @startdate
AND creationdate < dateadd(dd, @days, cast(convert(char(11), @startdate, 113)  as datetime))
GROUP BY userid
HAVING count(1) >= @days

The statement cast(convert(char(11), @startdate, 113) as datetime) removes the time part of the date so we start at midnight.

I would assume also that the creationdate and userid columns are indexed.

I just realized that this won't tell you all the users and their total consecutive days. But will tell you which users will have been visiting a set number of days from a date of your choosing.

Revised solution:

declare @days as int
set @days = 30
select t1.userid
from UserHistory t1
where (select count(1) 
       from UserHistory t3 
       where t3.userid = t1.userid
       and t3.creationdate >= DATEADD(dd, DATEDIFF(dd, 0, t1.creationdate), 0) 
       and t3.creationdate < DATEADD(dd, DATEDIFF(dd, 0, t1.creationdate) + @days, 0) 
       group by t3.userid
) >= @days
group by t1.userid

I've checked this and it will query for all users and all dates. It is based on Spencer's 1st (joke?) solution, but mine works.

Update: improved the date handling in the second solution.

玻璃人 2024-08-04 23:20:06

这应该可以满足您的要求,但我没有足够的数据来测试效率。 复杂的 CONVERT/FLOOR 内容是从日期时间字段中删除时间部分。 如果您使用的是 SQL Server 2008,则可以使用 CAST(x.CreationDate AS DATE)。

DECLARE @Range as INT
SET @Range = 10

SELECT DISTINCT UserId, CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, a.CreationDate)))
  FROM tblUserLogin a
WHERE EXISTS
   (SELECT 1 
      FROM tblUserLogin b 
     WHERE a.userId = b.userId 
       AND (SELECT COUNT(DISTINCT(CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, CreationDate))))) 
              FROM tblUserLogin c 
             WHERE c.userid = b.userid 
               AND CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, c.CreationDate))) BETWEEN CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, a.CreationDate))) and CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, a.CreationDate)))+@Range-1) = @Range)

创建脚本

CREATE TABLE [dbo].[tblUserLogin](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [UserId] [int] NULL,
    [CreationDate] [datetime] NULL
) ON [PRIMARY]

This should do what you want but I don't have enough data to test efficiency. The convoluted CONVERT/FLOOR stuff is to strip the time portion off the datetime field. If you're using SQL Server 2008 then you could use CAST(x.CreationDate AS DATE).

DECLARE @Range as INT
SET @Range = 10

SELECT DISTINCT UserId, CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, a.CreationDate)))
  FROM tblUserLogin a
WHERE EXISTS
   (SELECT 1 
      FROM tblUserLogin b 
     WHERE a.userId = b.userId 
       AND (SELECT COUNT(DISTINCT(CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, CreationDate))))) 
              FROM tblUserLogin c 
             WHERE c.userid = b.userid 
               AND CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, c.CreationDate))) BETWEEN CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, a.CreationDate))) and CONVERT(DATETIME, FLOOR(CONVERT(FLOAT, a.CreationDate)))+@Range-1) = @Range)

Creation script

CREATE TABLE [dbo].[tblUserLogin](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [UserId] [int] NULL,
    [CreationDate] [datetime] NULL
) ON [PRIMARY]
深居我梦 2024-08-04 23:20:06

斯宾塞几乎做到了,但这应该是工作代码:

SELECT DISTINCT UserId
FROM History h1
WHERE (
    SELECT COUNT(*) 
    FROM History
    WHERE UserId = h1.UserId AND CreationDate BETWEEN h1.CreationDate AND DATEADD(d, @n-1, h1.CreationDate)
) >= @n

Spencer almost did it, but this should be the working code:

SELECT DISTINCT UserId
FROM History h1
WHERE (
    SELECT COUNT(*) 
    FROM History
    WHERE UserId = h1.UserId AND CreationDate BETWEEN h1.CreationDate AND DATEADD(d, @n-1, h1.CreationDate)
) >= @n
强者自强 2024-08-04 23:20:06

我突然想到 MySQLish:

SELECT start.UserId
FROM UserHistory AS start
  LEFT OUTER JOIN UserHistory AS pre_start ON pre_start.UserId=start.UserId
    AND DATE(pre_start.CreationDate)=DATE_SUB(DATE(start.CreationDate), INTERVAL 1 DAY)
  LEFT OUTER JOIN UserHistory AS subsequent ON subsequent.UserId=start.UserId
    AND DATE(subsequent.CreationDate)<=DATE_ADD(DATE(start.CreationDate), INTERVAL 30 DAY)
WHERE pre_start.Id IS NULL
GROUP BY start.Id
HAVING COUNT(subsequent.Id)=30

未经测试,几乎肯定需要对 MSSQL 进行一些转换,但我认为这可以提供一些想法。

Off the top of my head, MySQLish:

SELECT start.UserId
FROM UserHistory AS start
  LEFT OUTER JOIN UserHistory AS pre_start ON pre_start.UserId=start.UserId
    AND DATE(pre_start.CreationDate)=DATE_SUB(DATE(start.CreationDate), INTERVAL 1 DAY)
  LEFT OUTER JOIN UserHistory AS subsequent ON subsequent.UserId=start.UserId
    AND DATE(subsequent.CreationDate)<=DATE_ADD(DATE(start.CreationDate), INTERVAL 30 DAY)
WHERE pre_start.Id IS NULL
GROUP BY start.Id
HAVING COUNT(subsequent.Id)=30

Untested, and almost certainly needs some conversion for MSSQL, but I think that give some ideas.

你的背包 2024-08-04 23:20:06

使用理货表怎么样? 它遵循更加算法化的方法,执行计划变得轻而易举。 用您想要扫描表的从 1 到“MaxDaysBehind”的数字填充tallyTable(即 90 将查找 3 个月后的数据等)。

declare @ContinousDays int
set @ContinousDays = 30  -- select those that have 30 consecutive days

create table #tallyTable (Tally int)
insert into #tallyTable values (1)
...
insert into #tallyTable values (90) -- insert numbers for as many days behind as you want to scan

select [UserId],count(*),t.Tally from HistoryTable 
join #tallyTable as t on t.Tally>0
where [CreationDate]> getdate()[email protected] and 
      [CreationDate]<getdate()-t.Tally 
group by [UserId],t.Tally 
having count(*)>=@ContinousDays

delete #tallyTable

How about one using Tally tables? It follows a more algorithmic approach, and execution plan is a breeze. Populate the tallyTable with numbers from 1 to 'MaxDaysBehind' that you want to scan the table (ie. 90 will look for 3 months behind,etc).

declare @ContinousDays int
set @ContinousDays = 30  -- select those that have 30 consecutive days

create table #tallyTable (Tally int)
insert into #tallyTable values (1)
...
insert into #tallyTable values (90) -- insert numbers for as many days behind as you want to scan

select [UserId],count(*),t.Tally from HistoryTable 
join #tallyTable as t on t.Tally>0
where [CreationDate]> getdate()[email protected] and 
      [CreationDate]<getdate()-t.Tally 
group by [UserId],t.Tally 
having count(*)>=@ContinousDays

delete #tallyTable
〆一缕阳光ご 2024-08-04 23:20:06

稍微调整一下比尔的查询。 您可能必须在分组之前截断日期以仅计算每天一次登录...

SELECT UserId from History 
WHERE CreationDate > ( now() - n )
GROUP BY UserId, 
DATEADD(dd, DATEDIFF(dd, 0, CreationDate), 0) AS TruncatedCreationDate  
HAVING COUNT(TruncatedCreationDate) >= n

已编辑以使用 DATEADD(dd, DATEDIFF(dd, 0, CreationDate), 0) 而不是 Convert( char(10) , CreationDate, 101 ) 。

@IDisposable
我本来想早点使用 datepart 但我懒得查找语法所以我想 id 使用 Convert 代替。 我不知道它有重大影响谢谢! 现在我明白了。

Tweaking Bill's query a bit. You might have to truncate the date before grouping to count only one login per day...

SELECT UserId from History 
WHERE CreationDate > ( now() - n )
GROUP BY UserId, 
DATEADD(dd, DATEDIFF(dd, 0, CreationDate), 0) AS TruncatedCreationDate  
HAVING COUNT(TruncatedCreationDate) >= n

EDITED to use DATEADD(dd, DATEDIFF(dd, 0, CreationDate), 0) instead of convert( char(10) , CreationDate, 101 ).

@IDisposable
I was looking to use datepart earlier but i was too lazy to look up the syntax so i figured i d use convert instead. I dint know it had a significant impact Thanks! now i know.

走过海棠暮 2024-08-04 23:20:06

假设一个模式如下:

create table dba.visits
(
    id  integer not null,
    user_id integer not null,
    creation_date date not null
);

这将从带有间隙的日期序列中提取连续的范围。

select l.creation_date  as start_d, -- Get first date in contiguous range
    (
        select min(a.creation_date ) as creation_date 
        from "DBA"."visits" a 
            left outer join "DBA"."visits" b on 
                   a.creation_date = dateadd(day, -1, b.creation_date ) and 
                   a.user_id  = b.user_id 
            where b.creation_date  is null and
                  a.creation_date  >= l.creation_date  and
                  a.user_id  = l.user_id 
    ) as end_d -- Get last date in contiguous range
from  "DBA"."visits" l
    left outer join "DBA"."visits" r on 
        r.creation_date  = dateadd(day, -1, l.creation_date ) and 
        r.user_id  = l.user_id 
    where r.creation_date  is null

assuming a schema that goes like:

create table dba.visits
(
    id  integer not null,
    user_id integer not null,
    creation_date date not null
);

this will extract contiguous ranges from a date sequence with gaps.

select l.creation_date  as start_d, -- Get first date in contiguous range
    (
        select min(a.creation_date ) as creation_date 
        from "DBA"."visits" a 
            left outer join "DBA"."visits" b on 
                   a.creation_date = dateadd(day, -1, b.creation_date ) and 
                   a.user_id  = b.user_id 
            where b.creation_date  is null and
                  a.creation_date  >= l.creation_date  and
                  a.user_id  = l.user_id 
    ) as end_d -- Get last date in contiguous range
from  "DBA"."visits" l
    left outer join "DBA"."visits" r on 
        r.creation_date  = dateadd(day, -1, l.creation_date ) and 
        r.user_id  = l.user_id 
    where r.creation_date  is null
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文