使用 TSQL 进行每日计数?
我有一个网站,在 SQL Server 2008 数据库中记录每个单击链接的客户端指标。我已经编写了查询来获取每日总点击次数,但是我想找出用户在给定时间跨度内(即 5 秒内)点击了多少次。
这里的想法是锁定试图抓取内容的传入 IP 地址。如果在 5 秒内检测到超过 5 次“点击”,或者来自给定 IP 地址的每日点击次数超过某个值,则假定这是一次抓取尝试。
我尝试了以下几种变体:
-- when a user clicked more than 5 times in 5 seconds
SELECT DATEADD(SECOND, DATEDIFF(SECOND, 0, ClickTimeStamp), 0) as ClickTimeStamp, COUNT(UserClickID) as [Count]
FROM UserClicks
WHERE DATEDIFF(SECOND, 0, ClickTimeStamp) = 5
GROUP BY IPAddress, ClickTimeStamp
这个特别返回以下错误:
消息 535,级别 16,状态 0,第 3 行 datediff 函数导致 溢出。分隔两个日期/时间的日期部分的数量 实例太大。尝试使用不太精确的 datediff 日期部分。
因此,我想再次使用秒日期部分,我相信我走在正确的轨道上,但还没有完全明白。
帮助表示赞赏。谢谢。
--更新--
很好的建议,帮助我认为这种方法是错误的。每次点击都会进行检查。我应该做的是对于给定的时间戳,检查在过去 5 秒内是否从同一 IP 地址记录了 5 次点击。所以它会是这样的,计算 > 的点击次数GetDate() - 5 秒
尝试以下操作仍然没有给我一个准确的数字。
SELECT COUNT(*)
FROM UserClicks
WHERE ClickTimeStamp >= GetDate() - DATEADD(SECOND, -5, GetDate())
I have a site where I record client metrics in a SQL Server 2008 db on every link clicked. I have already written the query to get the daily total clicks, however I want to find out how many times the user clicked within a given timespan (ie. within 5 seconds).
The idea here is to lock out incoming IP addresses that are trying to scrape content. It would be assumed that if more than 5 "clicks" is detected within 5 seconds or the number of daily clicks from a given IP address exceeds some value, that this is a scraping attempt.
I have tried a few variations of the following:
-- when a user clicked more than 5 times in 5 seconds
SELECT DATEADD(SECOND, DATEDIFF(SECOND, 0, ClickTimeStamp), 0) as ClickTimeStamp, COUNT(UserClickID) as [Count]
FROM UserClicks
WHERE DATEDIFF(SECOND, 0, ClickTimeStamp) = 5
GROUP BY IPAddress, ClickTimeStamp
This one in particular returns the following error:
Msg 535, Level 16, State 0, Line 3 The datediff function resulted in
an overflow. The number of dateparts separating two date/time
instances is too large. Try to use datediff with a less precise
datepart.
So once again, I want to use the seconds datepart, which I believe I'm on the right track, but not quite getting it.
Help appreciated. Thanks.
-- UPDATE --
Great suggestions and helped me think that the approach is wrong. The check is going to be made on every click. What I should do is for a given timestamp, check to see if in the last 5 seconds 5 clicks have been recorded from the same IP address. So it would be something like, count the number of clicks for > GetDate() - 5 seconds
Trying the following still isn't giving me an accurate figure.
SELECT COUNT(*)
FROM UserClicks
WHERE ClickTimeStamp >= GetDate() - DATEADD(SECOND, -5, GetDate())
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
希望我的语法是好的,我只有 oracle 来测试这个。我假设您有一个名为 user_id 的 ID 列,该列对于该用户来说是唯一的(是 user_click_id 吗?如果可以的话,在这些问题中包含表创建语句会很有帮助)
您必须对此执行自连接。逻辑将采用 userclick 并在 userId = userId 上加入 userclick,并且 clicktimestamp 的差异在 0-5 秒之间。然后从子选择开始计数。
此 select 语句应该为您提供 user_id/clicktimestampe 以及与同一用户的 clicktimestamp 相距 0 到 5 秒之间的每条记录的 1 行。现在只需计算所有 user_id,u1.clicktimestamp 组合并突出显示具有 5 个或更多的组合即可。将上面的查询转换为子选择并从中提取计数:
希望我可以在 MS 计算机上验证我的语法......其中可能存在一些拼写错误,但逻辑应该很好。
Hoping my syntax is good, I only have oracle to test this on. I'm going to assume you have an ID column called user_id that is unique to that user (is it user_click_id? helpful to include table create statements in these questions when you can)
You'll have to preform a self join on this one. Logic will be take the userclick and join onto userclick on userId = userId and difference on clicktimestamp is between 0-5 seconds. Then it's counting from the subselect.
This select statement should give you the user_id/clicktimestampe and 1 row for every record that is between 0 and 5 seconds apart from that clicktimestamp from the same user. Now it's just a matter of counting all user_id,u1.clicktimestamp combinations and highlighting the ones with 5 or more. Take the above query and turn it into a subselect and pull counts from it:
Wish I could verify my syntax on a MS machine....there might be some typo's in there, but the logic should be good.
您更新的答案:问题出在
GetDate() - DATEADD(SECOND, -5, GetDate())
的第三行,意思是“获取当前日期时间并减去(当前日期)时间减去五秒)”。我不完全确定这会产生什么样的价值,但它不会是你想要的。你仍然想要某种时间段,也许像这样:
我在那里使用 getdate() 有点不舒服——如果你有一个特定的日期时间值(精确到秒),你可能应该使用它。
An answer for your UPDATE: the problem is in the third line of
GetDate() - DATEADD(SECOND, -5, GetDate())
is saying "take the current date time and subtract (the current date time minus five seconds)". I'm not entirely sure what kind of value this produces, but it won't be the one you want.You still want some kind of time-period, perahps like so:
I'm a bit uncomfortable using getdate() there--if you have a specific datetime value (accurate to the second), you should probably use it.
假设仅为当前活动输入日志条目——也就是说,每当插入新行时,记录的时间都是该时间点的时间,而不是任何先前时间点的时间——那么您应该只需要查看某个时间点的数据设置时间段,而不必像现在一样查看“所有数据”。
下一个问题是:您多久进行一次检查?如果您关心每秒点击次数,那么“每小时一次”和“每 24 小时一次”之间的值似乎是合理的。
下一步:定义你的间隔。 “5秒内每个IP地址的所有点击”可以有两种方式:设置窗口(00-04、05-09、10-14等)或滑动窗口(00-04、01-05、02-06等)。 )可能与 5 秒的窗口无关,但可能与更长的时间段(每“天”的点击次数)更相关。
因此,我采取的一般方法是:
在临时表上按 lowValue 和 highValue 之间的秒数进行内部联接
)count(*) > X
),然后将其剔除。Assuming log entries are only entered for current activity -- that is, whenever a new row is inserted, the logged time is for that point in time and never for any prior point in time -- then you should only need to review data for a set period of time, and not have to review "all data" as you are doing now.
Next question is: how frequently do you make this check? If you are concerned with clicks per second, then something between "once per hour" and "once every 24 hours" seems reasonable.
Next up: define your interval. "All clicks per IPAddress within 5 seconds" could go two ways: set window (00-04, 05-09, 10-14, etc), or sliding window(00-04, 01-05, 02-06, etc.) Probably irrelevant with a 5 second window, but perhaps more relevant for longer periods (clicks per "day").
With that, the general approach I'd take is:
inner join on the temp table on seconds between lowValue and highValue
)having count(*) > X
), and defenestrate them.