以编程方式比较两行(库存模式匹配)
我想要做的是采用某种股票模式(定义为一系列 x 和 y 坐标)并将其与历史股票价格进行比较。 如果我在历史价格中发现任何与我定义的模式类似的内容,我想将其作为匹配项返回。
我不知道如何确定两条曲线的相似程度。 我做了一些研究,你可以找到两条直线的相似性(通过线性回归),但我还没有找到比较两条曲线的好方法。
我现在最好的方法是从我正在查看的历史数据范围中获取几个高点和低点,找到线条的斜率,并将它们与我试图匹配的模式的斜率进行比较,看看它们是否“大致相同。
还有更好的想法吗? 我很想听听他们的声音!
编辑:感谢您的意见! 我之前考虑过最小二乘法,但我不确定该用在哪里。 不过,在我收到输入后,我认为首先计算每行的最小二乘法以稍微平滑数据,然后像詹姆斯建议的那样缩放和拉伸模式应该可以得到我想要的东西。
我计划用它来识别股票中的某些技术标志,以确定买入和卖出信号。 已经有网站在某种程度上做到了这一点(例如stockfetcher),但我当然会我想亲自尝试一下,看看我是否可以做得更好。
What I want to do is take a certain stock pattern (defined as a series of x and y coordinates) and compare it against historical stock prices. If I find anything in the historical prices similar to that pattern I defined, I would like to return it as a match.
I'm not sure how to determine how similar two curved lines are. I did some research, and you can find the similarity of two straight lines (with linear regression), but I haven't yet come across a good way to compare two curved lines.
My best approach right now is get several high and low points from the historical data range I'm looking at, find the slopes of the lines, and compare those to the slopes of the pattern I'm trying to match to see if they're roughly the same.
Any better ideas? I'd love to hear them!
Edit: Thanks for the input! I considered the least squares approach before, but I wasn't sure where to go with it. After the input I received though, I think computing the least squares of each line first to smooth out the data a little bit, then scaling and stretching the pattern like James suggested should get me what I'm looking for.
I plan on using this to identify certain technical flags in the stock to determine buy and sell signals. There are already sites out there that do this to some degree (such as stockfetcher), but of course I'd like to try it myself and see if I can do any better.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
计算每个点上残差(y 差)的总最小二乘值。 这应该可以让您衡量几何拟合程度(它们看起来有多相似)。 然后,您应该能够为“足够相似”设置一些容差。
请参阅http://en.wikipedia.org/wiki/Curve_fitting
Compute the total least squares of the residuals (y differences) on each point. This should give you a measure of the geometric fit (how similar they look). You should then be able to set some tolerance for 'similar enough'.
See http://en.wikipedia.org/wiki/Curve_fitting
数学不是我的强项,但是您也许可以使用相关性。
计算两个数据集之间的相关值,如果相关性大于某个值(0.8?),则认为这些数据集足够相似。
Math is not my strong point, however you might be able to use Correlation.
Calculate the correlation value between the two data-sets and and if the correlation is greater than some value (.8?), then consider the sets similar enough.
问题之一是使用非线性函数进行曲线拟合并不总是适用于某些模式,具体取决于它们的复杂程度。 您可以使用二次、三次或其他阶多项式来获得更准确的结果,但它并非在所有情况下都有效,尤其是在数据随时间发生急剧变化的情况下。
老实说,我认为一个合理且相对简单的解决方案是“缩放”和“拉伸”您的模式,使其出现在与历史数据相同的范围内。 您可以对 x 轴使用插值,对 y 轴使用乘法加偏移。 之后,只需查看每个点的平方差的平均值,如果低于阈值,那么您可以认为它是匹配的。 需要进行一些调整才能获得可预测的结果,但我认为这是一种很好的方法,它应该允许您定义任何类型的模式,而不依赖于回归产生完美拟合的曲线。 本质上它只是统计学的一个应用。 您还可以查看标准差或方差以获得更全面的方法。
One of the problems is that curve fitting using non-linear functions is not always going to work for some of your patterns depending how complex they are. You could use quadratic or cubic or some other order of polynomials to get a more accurate result but it's not going to work in all situations, particularly with any sharp changes in the data over time.
Honestly I think a reasonable and relatively simple solution is to 'scale' and 'stretch' your pattern so that it occurs over the same range as the historical data. You can use interpolation for the x axis and multiplication plus an offset for the y-axis. After that just look at the mean of the squared differences at each point and if that is lower than a threshold value then you can consider it a match. It will require a bit of tweaking to achieve predictable results but I think it's a nice approach that should allow you to define any sort of pattern without relying on regression producing a nicely fitted curve. Essentially it's just an application of statistics. You could also look at standard deviations or variance for a more comprehensive approach.
或者也许看看衍生品?
理论上,股票价格变动通常被建模为带有漂移因子的布朗运动。 (我知道的很少,但是请看此处)
如果您不这样做不介意我问,这会是什么目的?
or perhaps look at the derivatives?
stock price movement in theory is usually modeled as brownian motion with a drift factor. (i know very little, but take a look here)
if you don't mind me asking, to what end might that be?
一种想法可能是采用不同时间范围(周、月、年每日;月、年每周等)的移动平均线,并将它们与现在的移动平均线进行比较。
单独的平均值也会让您更容易进行比较。如果平均值中的连续项目采用某种标准化形式(例如从 0..1 到考虑分割等),您可以在向量内部相互比较连续元素某个范围
epsilon
,并获得匹配的可能性。只是一个想法。
Mathworld (http://mathworld.wolfram.com/) 也应该对曲线比较有一些看法。
One thought might be to take moving averages of varying time ranges (daily for weeks, months, years; weekly for months, years; etc) and compare them to moving averages now.
The individual averages would also give you an easier comparison.. if consecutive items in the averages are in some normalized form (say from 0..1 to account for splits, etc), you can compare consecutive elements in the vector to each other inside some range
epsilon
, and get a potential of a match.Just a thought.
Mathworld (http://mathworld.wolfram.com/) should also have some take on curve comparisons.
最小二乘法并不是你能做的最好的事情。 使用RANSAC算法。 它会处理这种数据,因为这种数据非常不可预测,而且往往是有噪音的。
Least squares wouldn't be the best you could do on it. Use the RANSAC algorithm. It will handle this kind of data, because this kind of data is very unpredictable and is often noisy.