分析不断增加的数据的最佳方法？

发布于 2024-12-29 10:08:25 字数 1332 浏览 1 评论 0原文

我的代码从 Web 服务捕获数值。捕获的值通常会增加，但有时会减少。现在，我只是对我在特定时间观察到的情况进行愚蠢的捕获...这就是我的表结构现在的样子：

id   date_collected      value1_observed  value2_observed
1    2012-01-26 1:00am   500              12
2    2012-01-26 1:01am   90000            NULL
3    2012-01-26 1:02am   100              50
1    2012-01-26 7:23am   502              12
2    2012-01-26 7:24am   90105            NULL
3    2012-01-26 8:09am   97               55

您可以看到...

ID=1 value1 增加了 26 小时 23 分钟内
ID=2 value1 增加了 105 6 小时 23 分钟
ID=3 value1 减少了 3 7 小时 8 分钟

困难：

收集时间不一致（我每天收集多次，收集之间的时间长度不同）
值可以增加或减少或为空

我已经捕获了这些记录的数千万行，我希望能够更轻松地查询它们，所以我想我需要将这些数据转换为更适合我想回答的问题的模式。我希望能够找到这样的模式：

“在凌晨 1 点到凌晨 5 点之间，这些 ID 的 value1 变化最大”。
“在过去 30 天里，哪些 ID 的价值2增长最快（从绝对值和相对值来看）？”

关于如何构建架构有什么建议吗？是否可以将这些数据转换为传统的星型模式？

更新回答问题：

我目前正在使用 Microsoft SQL Server，但愿意使用 MySQL 或其他开源选项。
如果 id 1 从 50 变为 100 再到 52，我想知道它变为 +50，然后变为 -48。这样我就可以看到随着时间的推移的波动和净收益/损失。例如，如果我跟踪网站上的注册用户......通常数字总是上升，但如果人们删除他们的帐户，我想识别这种趋势。但是，如果我的应用程序没有观察到从 50 到 100 的变化，并且只记录了 50 和 52，它就不会知道该数字曾经达到 100。

感谢您的任何见解/指导，

-Paul

原文

My code captures numeric values from web services. The captured values are generally increasing, but sometimes they decrease. For now, I just do a dumb capture of what I have observed at a specific time... this is what my table structure looks like now:

id   date_collected      value1_observed  value2_observed
1    2012-01-26 1:00am   500              12
2    2012-01-26 1:01am   90000            NULL
3    2012-01-26 1:02am   100              50
1    2012-01-26 7:23am   502              12
2    2012-01-26 7:24am   90105            NULL
3    2012-01-26 8:09am   97               55

You can see that...

ID=1 value1 increased by 2 in 6 hours 23 minutes
ID=2 value1 increased by 105 in 6 hours 23 minutes
ID=3 value1 decreased by 3 in 7 hours 8 minutes

Difficulties:

The collection time is inconsistent (I collect multiple times per day with a varying length of time between collections)
the values can increase or decrease or be null

I have captured tens of millions of rows of these records and I'd like to be able to query them more easily, so I'm thinking I need to transform this data into a schema that is better suited for the questions I want to answer. I'd like to be able to find patterns like:

"Between the hours of 1am and 5am, these IDs had the largest changes in value1".
"Over the last 30 days, which IDs have had value2 grow the most (in absolute and relative terms)?"

Are there any suggestions on how I should structure the schema? Would it be possible to transform this data to a traditional star schema?

UPDATE to answer questions:

I am currently using Microsoft SQL Server, but am open to using MySQL or other open source options.
If id 1 changes from 50 to 100 to 52, I would like to know that it went +50 and then -48. That way I could see fluctuation and net gain/loss over time. For example, if I were tracking registered users on a site... generally that numbers always goes up, but if people delete their accounts, I would want to identify that trend. However, if my app didn't observe the change from 50 to 100 and only recorded 50 and 52, it would not know that the number ever hit 100.

Thanks for any insight/direction,

-Paul

分享到QQ

分享到微博