对数据点在时间上不匹配的时间序列图中求和/堆栈值的算法

发布于 2024-10-08 10:40:10 字数 634 浏览 10 评论 0原文

我有一个绘图/分析问题，我无法完全理解。我可以用蛮力，但它太慢了，也许有人有更好的主意，或者知道Python的快速库？

我有 2 个以上的时间序列数据集 (x,y)，我想要聚合（并随后绘制）。问题是该系列中的 x 值不匹配，我真的不想将值复制到时间段中。

因此，给定这两个系列：

S1: (1;100) (5;100) (10;100)
S2: (4;150) (5;100) (18;150)

当加在一起时，应该得到：

ST: (1;100) (4;250) (5;200) (10;200) (18;250)

逻辑：

x=1 s1=100, s2=None, sum=100
x=4 s1=100, s2=150, sum=250 (note s1 value from previous value)
x=5 s1=100, s2=100, sum=200
x=10 s1=100, s2=100, sum=200
x=18 s1=100, s2=150, sum=250

我当前的想法是迭代键（x）的排序列表，保留每个系列的前一个值，并查询每个集合是否有新的 y对于 x。

任何想法将不胜感激！

原文

I have a graphing/analysis problem i can't quite get my head around. I can do a brute force, but its too slow, maybe someone has a better idea, or knows or a speedy library for python?

I have 2+ time series data sets (x,y) that i want to aggregate (and subsequently plot). The issue is that the x values across the series don't match up, and i really don't want to resort to duplicating values into time bins.

So, given these 2 series:

S1: (1;100) (5;100) (10;100)
S2: (4;150) (5;100) (18;150)

When added together, should result in:

ST: (1;100) (4;250) (5;200) (10;200) (18;250)

Logic:

x=1 s1=100, s2=None, sum=100
x=4 s1=100, s2=150, sum=250 (note s1 value from previous value)
x=5 s1=100, s2=100, sum=200
x=10 s1=100, s2=100, sum=200
x=18 s1=100, s2=150, sum=250

My current thinking is to iterate a sorted list of keys(x), keep the previous value for each series, and query each set if it has a new y for the x.

Any ideas would be appreciated!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

醉生梦死 2024-10-15 10:40:10

这是另一种方法，将更多的行为放在各个数据流上：

class DataStream(object):
    def __init__(self, iterable):
        self.iterable = iter(iterable)
        self.next_item = (None, 0)
        self.next_x = None
        self.current_y = 0
        self.next()

    def next(self):
        if self.next_item is None:
            raise StopIteration()
        self.current_y = self.next_item[1]
        try:
            self.next_item = self.iterable.next()
            self.next_x = self.next_item[0]
        except StopIteration:
            self.next_item = None
            self.next_x = None
        return self.next_item

    def __iter__(self):
        return self


class MergedDataStream(object):
    def __init__(self, *iterables):
        self.streams = [DataStream(i) for i in iterables]
        self.outseq = []

    def next(self):
        xs = [stream.next_x for stream in self.streams if stream.next_x is not None]
        if not xs:
            raise StopIteration()
        next_x = min(xs)
        current_y = 0
        for stream in self.streams:
            if stream.next_x == next_x:
                stream.next()
            current_y += stream.current_y
        self.outseq.append((next_x, current_y))
        return self.outseq[-1]

    def __iter__(self):
        return self


if __name__ == '__main__':
    seqs = [
        [(1, 100), (5, 100), (10, 100)],
        [(4, 150), (5, 100), (18, 150)],
        ]

    sm = MergedDataStream(*seqs)
    for x, y in sm:
        print "%02s: %s" % (x, y)

    print sm.outseq

Here's another way to do it, putting more of the behaviour on the individual data streams:

class DataStream(object):
    def __init__(self, iterable):
        self.iterable = iter(iterable)
        self.next_item = (None, 0)
        self.next_x = None
        self.current_y = 0
        self.next()

    def next(self):
        if self.next_item is None:
            raise StopIteration()
        self.current_y = self.next_item[1]
        try:
            self.next_item = self.iterable.next()
            self.next_x = self.next_item[0]
        except StopIteration:
            self.next_item = None
            self.next_x = None
        return self.next_item

    def __iter__(self):
        return self


class MergedDataStream(object):
    def __init__(self, *iterables):
        self.streams = [DataStream(i) for i in iterables]
        self.outseq = []

    def next(self):
        xs = [stream.next_x for stream in self.streams if stream.next_x is not None]
        if not xs:
            raise StopIteration()
        next_x = min(xs)
        current_y = 0
        for stream in self.streams:
            if stream.next_x == next_x:
                stream.next()
            current_y += stream.current_y
        self.outseq.append((next_x, current_y))
        return self.outseq[-1]

    def __iter__(self):
        return self


if __name__ == '__main__':
    seqs = [
        [(1, 100), (5, 100), (10, 100)],
        [(4, 150), (5, 100), (18, 150)],
        ]

    sm = MergedDataStream(*seqs)
    for x, y in sm:
        print "%02s: %s" % (x, y)

    print sm.outseq

回复收藏 0 原文

○闲身 2024-10-15 10:40:10

类似这样的：

def join_series(s1, s2):
    S1 = iter(s1)
    S2 = iter(s2)
    value1 = 0
    value2 = 0
    time1, next1 = next(S1)
    time2, next2 = next(S2)
    end1 = False
    end2 = False

    while True:    
        time = min(time1, time2)
        if time == time1:
            value1 = next1
            try:
                time1, next1 = next(S1)
            except StopIteration:
                end1 = True
                time1 = time2

        if time == time2:
            value2 = next2
            try:
                time2, next2 = next(S2)
            except StopIteration:
                end2 = True
                time2 = time1

        yield time, value1 + value2

        if end1 and end2:
            raise StopIteration

S1 = ((1, 100), (5, 100), (10, 100))
S2 = ((4, 150), (5, 100), (18, 150))

for result in join_series(S1, S2):
    print(result)

它基本上保留 S1 和 S2 的当前值以及 S1 和 S2 的下一个值，并根据“即将到来的时间”最低的一个逐步遍历它们。应该处理不同长度的列表，并一直使用迭代器，因此它应该能够处理大量数据系列等。

Something like this:

def join_series(s1, s2):
    S1 = iter(s1)
    S2 = iter(s2)
    value1 = 0
    value2 = 0
    time1, next1 = next(S1)
    time2, next2 = next(S2)
    end1 = False
    end2 = False

    while True:    
        time = min(time1, time2)
        if time == time1:
            value1 = next1
            try:
                time1, next1 = next(S1)
            except StopIteration:
                end1 = True
                time1 = time2

        if time == time2:
            value2 = next2
            try:
                time2, next2 = next(S2)
            except StopIteration:
                end2 = True
                time2 = time1

        yield time, value1 + value2

        if end1 and end2:
            raise StopIteration

S1 = ((1, 100), (5, 100), (10, 100))
S2 = ((4, 150), (5, 100), (18, 150))

for result in join_series(S1, S2):
    print(result)

It basically keeps the current value of S1 and S2, together with the next of S1 and S2, and steps through them based on which has the lowest "upcoming time". Should handle lists of different lengths to, and uses iterators all the way so it should be able to handle massive dataseries, etc, etc.

回复收藏 0 原文

软的没边 2024-10-15 10:40:10

一种可能的方法：

将所有系列的元素格式化为
元组 (x, y, 系列 id)，例如 (4,
150, 1) 并将它们添加到元组列表中，并按 x 升序排序。
声明一个长度等于系列数的列表，以维护每个系列的“最后查看”值。
迭代步骤(1)中列表的每个元素元组，并且：
3.1 根据元组中的系列id更新“最后看到”列表
3.2 当先前迭代的元组的 x 与当前元组的 x 不匹配时，将“最后一次看到”列表的所有元素相加，并将结果添加到最终列表。

现在进行我的肮脏测试：

>>> 
S1 = ((1, 100), (5, 100), (10, 100))
S2 = ((4, 150), (5, 100), (18, 150))
>>> all = []
>>> for s in S1: all.append((s[0], s[1], 0))
...
>>> for s in S2: all.appned((s[0], s[1], 1))
...
>>> all
[(1, 100, 0), (5, 100, 0), (10, 100, 0), (4, 150, 1), (5, 100, 1), (18, 150, 1)]
>>> all.sort()
>>> all
[(1, 100, 0), (4, 150, 1), (5, 100, 0), (5, 100, 1), (10, 100, 0), (18, 150, 1)]
>>> last_val = [0]*2
>>> last_x = all[0][0]
>>> final = []
>>> for e in all:
...     if e[0] != last_x:
...             final.append((last_x, sum(last_val)))
...     last_val[e[2]] = e[1]
...     last_x = e[0]
...
>>> final.append((last_x, sum(last_val)))
>>> final
[(1, 100), (4, 250), (5, 200), (10, 200), (18, 250)]
>>>

One possible approach:

Format all series' element into
tuples (x, y, series id), e.g. (4,
150, 1) and add them to a tuple list, and sort it by x ascending.
Declare a list with length equals to number of series to maintain "last seen" value for each series.
Iterate through each element tuple of list in step (1), and:
3.1 Update the "last seen" list according to series id in tuple
3.2 When x of previously iterated tuple doesn't match with x of current tuple, sum all element of "last seen" list and add the result to final list.

Now with my dirty test:

>>> 
S1 = ((1, 100), (5, 100), (10, 100))
S2 = ((4, 150), (5, 100), (18, 150))
>>> all = []
>>> for s in S1: all.append((s[0], s[1], 0))
...
>>> for s in S2: all.appned((s[0], s[1], 1))
...
>>> all
[(1, 100, 0), (5, 100, 0), (10, 100, 0), (4, 150, 1), (5, 100, 1), (18, 150, 1)]
>>> all.sort()
>>> all
[(1, 100, 0), (4, 150, 1), (5, 100, 0), (5, 100, 1), (10, 100, 0), (18, 150, 1)]
>>> last_val = [0]*2
>>> last_x = all[0][0]
>>> final = []
>>> for e in all:
...     if e[0] != last_x:
...             final.append((last_x, sum(last_val)))
...     last_val[e[2]] = e[1]
...     last_x = e[0]
...
>>> final.append((last_x, sum(last_val)))
>>> final
[(1, 100), (4, 250), (5, 200), (10, 200), (18, 250)]
>>>

回复收藏 0 原文

~没有更多了~