使用 TSQL 进行每日计数？

发布于 2024-12-19 21:05:43 字数 1032 浏览 1 评论 0原文

我有一个网站，在 SQL Server 2008 数据库中记录每个单击链接的客户端指标。我已经编写了查询来获取每日总点击次数，但是我想找出用户在给定时间跨度内（即 5 秒内）点击了多少次。

这里的想法是锁定试图抓取内容的传入 IP 地址。如果在 5 秒内检测到超过 5 次“点击”，或者来自给定 IP 地址的每日点击次数超过某个值，则假定这是一次抓取尝试。

我尝试了以下几种变体：

-- when a user clicked more than 5 times in 5 seconds
SELECT DATEADD(SECOND, DATEDIFF(SECOND, 0, ClickTimeStamp), 0) as ClickTimeStamp, COUNT(UserClickID) as [Count]
FROM UserClicks
WHERE DATEDIFF(SECOND, 0, ClickTimeStamp) = 5
GROUP BY IPAddress, ClickTimeStamp

这个特别返回以下错误：

消息 535，级别 16，状态 0，第 3 行 datediff 函数导致溢出。分隔两个日期/时间的日期部分的数量实例太大。尝试使用不太精确的 datediff 日期部分。

因此，我想再次使用秒日期部分，我相信我走在正确的轨道上，但还没有完全明白。

帮助表示赞赏。谢谢。

--更新--

很好的建议，帮助我认为这种方法是错误的。每次点击都会进行检查。我应该做的是对于给定的时间戳，检查在过去 5 秒内是否从同一 IP 地址记录了 5 次点击。所以它会是这样的，计算 > 的点击次数GetDate() - 5 秒

尝试以下操作仍然没有给我一个准确的数字。

SELECT COUNT(*)
FROM UserClicks
WHERE ClickTimeStamp >= GetDate() - DATEADD(SECOND, -5, GetDate())

原文

I have a site where I record client metrics in a SQL Server 2008 db on every link clicked. I have already written the query to get the daily total clicks, however I want to find out how many times the user clicked within a given timespan (ie. within 5 seconds).

The idea here is to lock out incoming IP addresses that are trying to scrape content. It would be assumed that if more than 5 "clicks" is detected within 5 seconds or the number of daily clicks from a given IP address exceeds some value, that this is a scraping attempt.

I have tried a few variations of the following:

-- when a user clicked more than 5 times in 5 seconds
SELECT DATEADD(SECOND, DATEDIFF(SECOND, 0, ClickTimeStamp), 0) as ClickTimeStamp, COUNT(UserClickID) as [Count]
FROM UserClicks
WHERE DATEDIFF(SECOND, 0, ClickTimeStamp) = 5
GROUP BY IPAddress, ClickTimeStamp

This one in particular returns the following error:

Msg 535, Level 16, State 0, Line 3 The datediff function resulted in
an overflow. The number of dateparts separating two date/time
instances is too large. Try to use datediff with a less precise
datepart.

So once again, I want to use the seconds datepart, which I believe I'm on the right track, but not quite getting it.

Help appreciated. Thanks.

-- UPDATE --

Great suggestions and helped me think that the approach is wrong. The check is going to be made on every click. What I should do is for a given timestamp, check to see if in the last 5 seconds 5 clicks have been recorded from the same IP address. So it would be something like, count the number of clicks for > GetDate() - 5 seconds

Trying the following still isn't giving me an accurate figure.

SELECT COUNT(*)
FROM UserClicks
WHERE ClickTimeStamp >= GetDate() - DATEADD(SECOND, -5, GetDate())

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

昨迟人 2024-12-26 21:05:43

希望我的语法是好的，我只有 oracle 来测试这个。我假设您有一个名为 user_id 的 ID 列，该列对于该用户来说是唯一的（是 user_click_id 吗？如果可以的话，在这些问题中包含表创建语句会很有帮助）

您必须对此执行自连接。逻辑将采用 userclick 并在 userId = userId 上加入 userclick，并且 clicktimestamp 的差异在 0-5 秒之间。然后从子选择开始计数。

select u1.user_id, u1.clicktimestamp, u2.clicktimestamp
from userclicks uc1
left join user_clicks uc2  
    on u2.userk_id = u1.user_id
    and datediff(second,u1.ClickTimeStamp,u2.ClickTimeStamp) <= 5
    and datediff(second,u1.ClickTimeStamp,u2.ClickTimeStamp) > 0

此 select 语句应该为您提供 user_id/clicktimestampe 以及与同一用户的 clicktimestamp 相距 0 到 5 秒之间的每条记录的 1 行。现在只需计算所有 user_id,u1.clicktimestamp 组合并突出显示具有 5 个或更多的组合即可。将上面的查询转换为子选择并从中提取计数：

select u1.user_id, u1.clicktimestamp, count(1)
from 
(select u1.user_id, u1.clicktimestamp
from userclicks uc1
left join user_clicks uc2  
    on u2.userk_id = u1.user_id
    and datediff(second,u1.ClickTimeStamp,u2.ClickTimeStamp) <= 5
    and datediff(second,u1.ClickTimeStamp,u2.ClickTimeStamp) > 0) a
group by u1.user_id, u1.clicktimestamp
having count(1) >= 5

希望我可以在 MS 计算机上验证我的语法......其中可能存在一些拼写错误，但逻辑应该很好。

Hoping my syntax is good, I only have oracle to test this on. I'm going to assume you have an ID column called user_id that is unique to that user (is it user_click_id? helpful to include table create statements in these questions when you can)

You'll have to preform a self join on this one. Logic will be take the userclick and join onto userclick on userId = userId and difference on clicktimestamp is between 0-5 seconds. Then it's counting from the subselect.

select u1.user_id, u1.clicktimestamp, u2.clicktimestamp
from userclicks uc1
left join user_clicks uc2  
    on u2.userk_id = u1.user_id
    and datediff(second,u1.ClickTimeStamp,u2.ClickTimeStamp) <= 5
    and datediff(second,u1.ClickTimeStamp,u2.ClickTimeStamp) > 0

This select statement should give you the user_id/clicktimestampe and 1 row for every record that is between 0 and 5 seconds apart from that clicktimestamp from the same user. Now it's just a matter of counting all user_id,u1.clicktimestamp combinations and highlighting the ones with 5 or more. Take the above query and turn it into a subselect and pull counts from it:

select u1.user_id, u1.clicktimestamp, count(1)
from 
(select u1.user_id, u1.clicktimestamp
from userclicks uc1
left join user_clicks uc2  
    on u2.userk_id = u1.user_id
    and datediff(second,u1.ClickTimeStamp,u2.ClickTimeStamp) <= 5
    and datediff(second,u1.ClickTimeStamp,u2.ClickTimeStamp) > 0) a
group by u1.user_id, u1.clicktimestamp
having count(1) >= 5

Wish I could verify my syntax on a MS machine....there might be some typo's in there, but the logic should be good.

回复收藏 0 原文

愁以何悠 2024-12-26 21:05:43

您更新的答案：问题出在

SELECT COUNT(*)
 FROM UserClicks
 WHERE ClickTimeStamp >= GetDate() - DATEADD(SECOND, -5, GetDate())

GetDate() - DATEADD(SECOND, -5, GetDate()) 的第三行，意思是“获取当前日期时间并减去（当前日期）时间减去五秒）”。我不完全确定这会产生什么样的价值，但它不会是你想要的。

你仍然想要某种时间段，也许像这样：

SELECT count(*)
 from UserClicks
 where IPAddress = @IPAddress
  and ClickTimeStamp between getdate() and dateadd(second, -5, getdate())

我在那里使用 getdate() 有点不舒服——如果你有一个特定的日期时间值（精确到秒），你可能应该使用它。

An answer for your UPDATE: the problem is in the third line of

SELECT COUNT(*)
 FROM UserClicks
 WHERE ClickTimeStamp >= GetDate() - DATEADD(SECOND, -5, GetDate())

GetDate() - DATEADD(SECOND, -5, GetDate()) is saying "take the current date time and subtract (the current date time minus five seconds)". I'm not entirely sure what kind of value this produces, but it won't be the one you want.

You still want some kind of time-period, perahps like so:

SELECT count(*)
 from UserClicks
 where IPAddress = @IPAddress
  and ClickTimeStamp between getdate() and dateadd(second, -5, getdate())

I'm a bit uncomfortable using getdate() there--if you have a specific datetime value (accurate to the second), you should probably use it.

回复收藏 0 原文

执手闯天涯 2024-12-26 21:05:43

假设仅为当前活动输入日志条目——也就是说，每当插入新行时，记录的时间都是该时间点的时间，而不是任何先前时间点的时间——那么您应该只需要查看某个时间点的数据设置时间段，而不必像现在一样查看“所有数据”。

下一个问题是：您多久进行一次检查？如果您关心每秒点击次数，那么“每小时一次”和“每 24 小时一次”之间的值似乎是合理的。

下一步：定义你的间隔。 “5秒内每个IP地址的所有点击”可以有两种方式：设置窗口（00-04、05-09、10-14等）或滑动窗口（00-04、01-05、02-06等）。）可能与 5 秒的窗口无关，但可能与更长的时间段（每“天”的点击次数）更相关。

因此，我采取的一般方法是：

从您关心的最早时间点（1小时前，24小时前）开始
设置“桶”，意味着可以识别时间窗口（00:00:00 - 00:00:04、00:00:05 - 00:00:09 等）。这可以作为临时表来完成。
对于所有事件，计算自最早点以来经过的秒数
对于每个存储桶，计算命中该存储桶的事件数，按 IP 地址分组（在临时表上按 lowValue 和 highValue 之间的秒数进行内部联接）
识别那些超出阈值的数据（count(*) > X），然后将其剔除。

回复收藏 0 原文

~没有更多了~

关于作者

口干舌燥

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

使用 TSQL 进行每日计数？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

佚名

羁客

天天爱笑的徐老师

星

夏日落

隐诗

友情链接

使用 TSQL 进行每日计数？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

佚名

羁客

天天爱笑的徐老师

星

夏日落

隐诗

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。