Django / MySQL 中的间隙检测

发布于 2024-10-17 03:54:17 字数 1134 浏览 2 评论 0原文

我将时间序列数据存储在 MySQL InnoDB 中,并使用 Django 的对象关系映射器访问它。

我的问题是:如何最好地识别和定位此时间序列数据中的差距?

编辑以澄清:虽然很容易获得所有丢失数据点的列表,但这并不能完全解决我的问题。我只想要间隙的开始和结束。连续周期的开始和结束也同样有效。

编辑以进一步澄清:该表的 mysql 列如下。 time 是一个标准的 Django DateTimeField。相关数据每 15 分钟采样一次。

mysql> show columns from datalogging_datapoint;
+----------------------+------------+------+-----+---------+----------------+
| Field                | Type       | Null | Key | Default | Extra          |
+----------------------+------------+------+-----+---------+----------------+
| id                   | int(11)    | NO   | PRI | NULL    | auto_increment |
| new_since_parsing    | tinyint(1) | NO   |     | NULL    |                |
| non_public           | tinyint(1) | NO   |     | NULL    |                |
| time                 | datetime   | NO   |     | NULL    |                |
| value                | double     | NO   |     | NULL    |                |
| parent_timeseries_id | int(11)    | NO   | MUL | NULL    |                |
+----------------------+------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)

I have timeseries data stored in MySQL InnoDB and I access it using Django's object relational mapper.

My question is: how can I best identify and locate gaps in this timeseries data?

Edit for clarification: although it's fairly easy to get a list of all the missing data points, that doesn't completely solve my problem. I want just the start and end of gaps. The start and end of continuous periods would work just as well.

Edit for further clarification: The mysql columns for the table are below. time is a standard Django DateTimeField. The data in question is sampled once every 15 minutes.

mysql> show columns from datalogging_datapoint;
+----------------------+------------+------+-----+---------+----------------+
| Field                | Type       | Null | Key | Default | Extra          |
+----------------------+------------+------+-----+---------+----------------+
| id                   | int(11)    | NO   | PRI | NULL    | auto_increment |
| new_since_parsing    | tinyint(1) | NO   |     | NULL    |                |
| non_public           | tinyint(1) | NO   |     | NULL    |                |
| time                 | datetime   | NO   |     | NULL    |                |
| value                | double     | NO   |     | NULL    |                |
| parent_timeseries_id | int(11)    | NO   | MUL | NULL    |                |
+----------------------+------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

荒路情人 2024-10-24 03:54:17

您必须提供某种示例数据以及您希望如何处理它。告诉我们您将其存储在 MySQL 中或使用 innodb 并不是问题的核心(例如,ORM 处理该问题)。我假设您能够将时间序列数据提取为整数列表,并且您试图找到该列表中间隙开始/结束的位置。

def gaps(seq):
    seq_set = set(seq) # e.g., set([0, 1, 2, 3, 7, 8, 9, 10, 16, 17, 18])
    full_set = set(range(seq[-1]+1)) # set([0,1,2,3,..., 17, 18])
    missing_pts = list(seq_set ^ full_set) # [4, 5, 6, 11, 12, 13, 14, 15]
    missing_pts.sort() # EDIT: originally didn't have this; 
                       # should have as sets are unordered.
    missing_pt_pairs = []
    first_pt = missing_pts[0]
    prev_pt = missing_pts[0]
    for pt in missing_pts:
        if pt - prev_pt > 1:
            missing_pt_pairs.append((first_pt, prev_pt))
            first_pt = pt
        prev_pt = pt
    missing_pt_pairs.append((first_pt, pt))
    return missing_pt_pairs

time_pts = [0,1,2,3,7,8,9,10,16,17,18]
gaps(time_pts) # returns [(4,6), (11,15)], 
# indicating that two gaps are present starting from [4,6] and [11,15]

You have to give some sort of sample data and how you like it processed. Telling us that you are storing it in MySQL or with innodb is not central to the problem (e.g., the ORM handles that). I assume you are able to extract your time series data as a list of integers, and you are trying to find where gaps start/end from this list.

def gaps(seq):
    seq_set = set(seq) # e.g., set([0, 1, 2, 3, 7, 8, 9, 10, 16, 17, 18])
    full_set = set(range(seq[-1]+1)) # set([0,1,2,3,..., 17, 18])
    missing_pts = list(seq_set ^ full_set) # [4, 5, 6, 11, 12, 13, 14, 15]
    missing_pts.sort() # EDIT: originally didn't have this; 
                       # should have as sets are unordered.
    missing_pt_pairs = []
    first_pt = missing_pts[0]
    prev_pt = missing_pts[0]
    for pt in missing_pts:
        if pt - prev_pt > 1:
            missing_pt_pairs.append((first_pt, prev_pt))
            first_pt = pt
        prev_pt = pt
    missing_pt_pairs.append((first_pt, pt))
    return missing_pt_pairs

time_pts = [0,1,2,3,7,8,9,10,16,17,18]
gaps(time_pts) # returns [(4,6), (11,15)], 
# indicating that two gaps are present starting from [4,6] and [11,15]
以酷 2024-10-24 03:54:17

查看 python 的 numpy 和 scipy 包 - 您可能会发现一些时间序列分析函数。然后只需从数据库中获取值即可,但这是标准的 django/python。

你想要这样的东西:

def gaps(seq):
  ...
  return gaps_found

当输入 [1,2,4,5,6,8,12] 时返回 [3,7,9,10,11] 吗?这对于集合来说也许是可行的。

Look into the numpy and scipy packages for python - you might find some time series analysis functions. Then its just a matter of getting the values out of the database, but thats standard django/python.

Do you want something like this:

def gaps(seq):
  ...
  return gaps_found

which, when fed [1,2,4,5,6,8,12] returns [3,7,9,10,11]? That might be doable with sets.

穿越时光隧道 2024-10-24 03:54:17

谢谢你们的建议!我从他们俩身上学到了一些东西。

然而,我认为我只是通过在脑海中重新表述问题来以理想的方式解决了我的问题。基本思想如下:

Count the values in a that year with Django's .count() .
If not complete:
    Count the values for each month in that year
    If not complete:
        Count the values for each day in that month

Thanks for the suggestions guys! I learned something from both of them.

However, I think I just solved my problem in an ideal way by rephrasing the question in my mind. Here's the basic idea:

Count the values in a that year with Django's .count() .
If not complete:
    Count the values for each month in that year
    If not complete:
        Count the values for each day in that month
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文