如何处理大量日志文件数据以在动态图表中显示?

发布于 2024-09-15 03:36:24 字数 428 浏览 5 评论 0原文

我有很多日志文件数据,我想显示基本上任意时间段的动态图表,可以选择按不同的列(我可以预先生成)过滤或聚合。我想知道在数据库中存储数据并访问它以显示图表的最佳方法,当:

  • 时间分辨率应该从一秒到一年变化
  • 有跨越多个“时间段”的条目,例如连接可能已经开放了几天,我想计算并显示用户连接的每个小时,而不仅仅是在创建或完成连接的小时“时段”中

是否有最佳实践或用于 Rails 的工具/插件帮助处理这种类型和数量的数据?是否有专门为此定制的数据库引擎,或者具有有用的功能(例如 CouchDB 索引)?

编辑:我正在寻找一种可扩展的方法来处理此数据和访问模式。我们考虑的事情:对每个存储桶运行查询,合并到应用程序中 - 可能太慢了。 GROUP BY 时间戳/粒度 - 无法正确计算连接数。按最小粒度将数据预处理为行并在查询时进行下采样 - 可能是最好的方法。

I have a lot of logfile data that I want to display dynamic graphs from, for basically arbitrary time periods, optionally filtered or aggregated by different columns (that I could pregenerate). I'm wondering about the best way to store the data in a database and access it for displaying charts, when:

  • the time resolution should be variable from one second to a year
  • there are entries that span several 'time buckets', e.g. a connection might have been open for a few days and I want to count and display the user for every hour she was connected, not just in the hour 'slot' the connection was created or finished

Are there best practices, or tools/plugins for rails that help handle this kind and amount of data? Are there maybe database engines specifically tailored towards this, or having helpful functions (e.g. CouchDB indexes)?

EDIT: I'm looking for a scalable way to handle this data and access pattern. Things we considered: Run a query for each bucket, merge in app - probably way too slow. GROUP BY timestamp/granularity - does not count connections correctly. Preprocessing data into rows by smallest granularity and downsampling on query - probably the best way.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

金橙橙 2024-09-22 03:36:24

我认为你可以使用 mysql 时间戳来实现这一点。

I think you can use mysql timestamps for this.

护你周全 2024-09-22 03:36:24

我最终解决这个问题的方法是将数据预处理到每分钟的存储桶中,因此每个事件和分钟都有一行。这使得选择并产生正确的结果变得足够简单和快速。要获得不同的粒度,您可以对时间戳列进行整数算术 - 选择abs(timestamp/factor)*factor 并按abs(timestamp/factor)*factor 进行分组。

The way I solved it in the end was to pre-process the data into per-minute buckets, so there's one row for every event and minute. That makes it easy and fast enough to select and yields correct results. To get different granularity, you can do integer arithmetic on the timestamp columns - select abs(timestamp/factor)*factor and group by abs(timestamp/factor)*factor.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文