如何存储和收集数据以挖掘最近24小时、最近7天、最近30天、最近365天浏览次数最多的信息?

发布于 2024-09-04 18:38:07 字数 638 浏览 5 评论 0原文

假设我们有一个高流量项目(一个地铁站点),它应该使用此选项提供排序(非实时)。视频数量约为200K,所有视频信息都存储在MySQL中。每日视频观看量约为1.5KK。作为工具,我们有硬盘驱动器(文本文件)MySQLRedis

Views
 top viewed
 top viewed last 24 hours
 top viewed last 7 days
 top viewed last 30 days
 top rated last 365 days

我应该如何存储这些信息?

第一个想法是将所有访问记录到文本文件(每小时一个文件,例如 visits_20080101_00.log)。在每小时开始时计算前一小时每个视频的观看次数并将此信息插入 MySQL。然后重新计算总计(过去 24 小时)并更新表中的统计信息。每天开始时我们都必须做同样的事情,但重新计算过去 7 天、过去 30 天、过去 365 天。这个方法对我来说似乎很差,因为我们必须存储每个视频最近 365 天的信息才能进行正确的计算。

还有其他好的方法吗?也许,我们必须为此选择其他仪器?

谢谢。

Let's imagine that we have high traffic project (a tube site) which should provide sorting using this options (NOT IN REAL TIME). Number of videos is about 200K and all information about videos is stored in MySQL. Number of daily video views is about 1.5KK. As instruments we have Hard Disk Drive (text files), MySQL, Redis.

Views
 top viewed
 top viewed last 24 hours
 top viewed last 7 days
 top viewed last 30 days
 top rated last 365 days

How should I store such information?

The first idea is to log all visits to text files (single file per hour, for example visits_20080101_00.log). At the beginning of each hour calculate views per video for previous hour and insert this information into MySQL. Then recalculate totals (for last 24 hours) and update statistics in tables. At the beginning of every day we have to do the same but recalculate for last 7 days, last 30 days, last 365 days. This method seems to be very poor for me because we have to store information about last 365 days for each video to make correct calculations.

Is there any other good methods? Probably, we have to choose another instruments for this?

Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

嘴硬脾气大 2024-09-11 18:38:07

如果绝对精度不重要,您可以将长度超过 2 个单位的信息总结回来。

您将存储过去 1-2 小时的单独视图、过去 1-2 天的每小时视图(每小时一个值)以及进一步的每日视图(每天一个值)。

“1-2”意味着您存储直到有两个单位已满,然后总结其中较早的一个。

If absolute precision is not important, you could summarize the information that is longer than 2 units back.

You would store the individual views for the last 1-2 hours, the hourly views (one value per hour) for the last 1-2 days, and the daily views (one value per day) further.

"1-2" means that you store until you have two units full, then summarize the earlier of them.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文