对数据点在时间上不匹配的时间序列图中求和/堆栈值的算法
我有一个绘图/分析问题,我无法完全理解。我可以用蛮力,但它太慢了,也许有人有更好的主意,或者知道Python的快速库?
我有 2 个以上的时间序列数据集 (x,y),我想要聚合(并随后绘制)。问题是该系列中的 x 值不匹配,我真的不想将值复制到时间段中。
因此,给定这两个系列:
S1: (1;100) (5;100) (10;100)
S2: (4;150) (5;100) (18;150)
当加在一起时,应该得到:
ST: (1;100) (4;250) (5;200) (10;200) (18;250)
逻辑:
x=1 s1=100, s2=None, sum=100
x=4 s1=100, s2=150, sum=250 (note s1 value from previous value)
x=5 s1=100, s2=100, sum=200
x=10 s1=100, s2=100, sum=200
x=18 s1=100, s2=150, sum=250
我当前的想法是迭代键(x)的排序列表,保留每个系列的前一个值,并查询每个集合是否有新的 y对于 x。
任何想法将不胜感激!
I have a graphing/analysis problem i can't quite get my head around. I can do a brute force, but its too slow, maybe someone has a better idea, or knows or a speedy library for python?
I have 2+ time series data sets (x,y) that i want to aggregate (and subsequently plot). The issue is that the x values across the series don't match up, and i really don't want to resort to duplicating values into time bins.
So, given these 2 series:
S1: (1;100) (5;100) (10;100)
S2: (4;150) (5;100) (18;150)
When added together, should result in:
ST: (1;100) (4;250) (5;200) (10;200) (18;250)
Logic:
x=1 s1=100, s2=None, sum=100
x=4 s1=100, s2=150, sum=250 (note s1 value from previous value)
x=5 s1=100, s2=100, sum=200
x=10 s1=100, s2=100, sum=200
x=18 s1=100, s2=150, sum=250
My current thinking is to iterate a sorted list of keys(x), keep the previous value for each series, and query each set if it has a new y for the x.
Any ideas would be appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是另一种方法,将更多的行为放在各个数据流上:
Here's another way to do it, putting more of the behaviour on the individual data streams:
类似这样的:
它基本上保留 S1 和 S2 的当前值以及 S1 和 S2 的下一个值,并根据“即将到来的时间”最低的一个逐步遍历它们。应该处理不同长度的列表,并一直使用迭代器,因此它应该能够处理大量数据系列等。
Something like this:
It basically keeps the current value of S1 and S2, together with the next of S1 and S2, and steps through them based on which has the lowest "upcoming time". Should handle lists of different lengths to, and uses iterators all the way so it should be able to handle massive dataseries, etc, etc.
一种可能的方法:
将所有系列的元素格式化为
元组 (x, y, 系列 id),例如 (4,
150, 1) 并将它们添加到元组列表中,并按 x 升序排序。
声明一个长度等于系列数的列表,以维护每个系列的“最后查看”值。
迭代步骤(1)中列表的每个元素元组,并且:
3.1 根据元组中的系列id更新“最后看到”列表
3.2 当先前迭代的元组的 x 与当前元组的 x 不匹配时,将“最后一次看到”列表的所有元素相加,并将结果添加到最终列表。
现在进行我的肮脏测试:
One possible approach:
Format all series' element into
tuples (x, y, series id), e.g. (4,
150, 1) and add them to a tuple list, and sort it by x ascending.
Declare a list with length equals to number of series to maintain "last seen" value for each series.
Iterate through each element tuple of list in step (1), and:
3.1 Update the "last seen" list according to series id in tuple
3.2 When x of previously iterated tuple doesn't match with x of current tuple, sum all element of "last seen" list and add the result to final list.
Now with my dirty test: