每当ABS(差)(差异),因为先前的样本超过阈值,下样本时间序列
我有一段时间的时间表,随着时间的流逝,股票的股票价格会逐渐变化。每当发生较小的变化(例如价格上涨0.01美元)时,就会创建一行新的数据。这导致一个非常大的数据系列,该系列的绘制缓慢。我想下样本,以便忽略了小更改(例如价格上涨/下/向上/向上/向上/向下/向上/向下/向下/向下,并且在50行数据后不变),从而提高了绘图速度而不牺牲图形的定性准确性。我只想在价格上涨/上/向上/上升时进行采样,以便我只显示明显的更改。
import pandas as pd
import numpy as np
prices = pd.DataFrame(np.random.randint(0,1000, size=(100, 1))/100+1000, columns=list('A'))
我希望每当与先前样本的差异超过一定阈值时进行采样。因此,默认情况下,我将采样第0行。如果第1、2、3和4排离第0行太近,我想把它们扔掉。然后,如果第5行距离第0行足够远,我将对其进行采样。然后,第5行成为我的新锚点,我将重复上面立即描述的相同过程。
有没有办法这样做,理想情况下没有循环?
I have a timeseries of intraday tick-by-tick stock prices that change gradually over time. Whenever there is a small change (e.g. the price increases by $0.01), a new row of data is created. This leads to a very large data series which is slow to plot. I want to downsample so that small changes (e.g. the price goes up/down/up/down/up/down and is unchanged after 50 rows of data) are ignored, which improves plotting speed without sacrificing the qualitative accuracy of the graph. I only want to sample if the price goes up/up/up/up so that I am only displaying obvious changes.
import pandas as pd
import numpy as np
prices = pd.DataFrame(np.random.randint(0,1000, size=(100, 1))/100+1000, columns=list('A'))
I wish to sample whenever the difference with the previous sample exceeds some threshold. So, I will sample row 0 by default. If row 1, 2, 3 and 4 are too close to row 0, I want to throw them away. Then, if row 5 is sufficiently far away from row 0, I will sample that. Then, row 5 becomes my new anchor point, and I will repeat the same process described immediately above.
Is there a way to do this, ideally without a loop?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以应用下采样掩蔽功能,该功能检查是否超过了距离。然后使用它选择选择适用的行。
这是下采样掩蔽函数:
然后将其应用并使用它作为掩码以获取所需的条目:
You could apply a down-sampling masking function that checks if the distance has been exceeded. Then use that to select to select the applicable rows.
Here is the down-sampling masking function:
Then apply it and use it as a mask to get the entries that you want:
不完全是要求的。我提供两个阈值和阈值和滑动期的选项。
Not exactly what was asked for. I offer two options with a threshold and a threshold and a sliding period.