趋势线的最佳拟合曲线
问题约束
- 已知数据集的大小,但不知道数据本身。
- 数据集每次增长一个数据点。
- 趋势线一次绘制一个数据点(使用样条/贝塞尔曲线)。
图表
下面的拼贴画显示了具有相当准确趋势线的数据集:
图表为:
- 左上。 按小时,包含约 24 个数据点。
- 右上角。一年中的每天,大约有 365 个数据点。
- 左下角。一年中每周,大约有 52 个数据点。
- 右下角。按一年的月份划分,大约有 12 个数据点。
用户输入
用户可以选择:
- 时间序列的类型(每小时、每天、每月、每季度、每年);以及
- 时间序列的开始和结束日期。
例如,用户可以选择 6 月份 30 天的每日报告。
趋势权重
要计算窗口大小(即计算趋势线时要平均的数据点数量),请使用以下表达式:
data points / trend weight
其中数据点
源自用户输入和趋势权重
为6.4。尽管 6.4 的趋势权重可以产生良好的拟合效果,但它相当任意,并且可能不适合不同的用户输入。
问题
考虑到此问题的限制,应如何计算趋势权重
?
Problem Constraints
- Size of the data set, but not the data itself, is known.
- Data set grows by one data point at a time.
- Trend line is graphed one data point at a time (using a spline/Bezier curve).
Graphs
The collage below shows data sets with reasonably accurate trend lines:
The graphs are:
- Upper-left. By hour, with ~24 data points.
- Upper-right. By day for one year, with ~365 data points.
- Lower-left. By week for one year, with ~52 data points.
- Lower-right. By month for one year, with ~12 data points.
User Inputs
The user can select:
- the type of time series (hourly, daily, monthly, quarterly, annual); and
- the start and end dates for the time series.
For example, the user could select a daily report for 30 days in June.
Trend Weight
To calculate the window size (i.e., the number of data points to average when calculating the trend line), the following expression is used:
data points / trend weight
Where data points
is derived from user inputs and trend weight
is 6.4. Even though a trend weight of 6.4 produces good fits, it is rather arbitrary, and might not be appropriate for different user inputs.
Question
How should trend weight
be calculated given the constraints of this problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
根据图表的外观,我想说你的 12 点图表有太多点(它只是给定点的样条......这在视觉上令人愉悦,但实际上在尝试理解时弊大于利)趋势),并且 365 点图的点太少。也许尝试做一些指数级的事情,比如:
我确实意识到这比你已经拥有的更加任意,但任意并不是世界上最糟糕的事情。
(我通过尝试保持 52 点图固定,得到了 14.1,因为那个看起来不错,通过采用
(52^(1.2)/52)*6.4=14.1
。您可以尝试使用这种技术 看看你能看到什么。除了 1.2 之外的其他能力,
Based on the looks of the graphs I would say you have too many points for your 12 point graph (it is just a spline of the points given... which is visually pleasing, but actually does more harm than good when trying to understand the trend) and too few points for your 365 point graph. Perhaps try doing something a little exponential like:
I do realize this is even more arbitrary than what you already have, but arbitrary isn't the worst thing in the world.
(I got 14.1 by trying to keep the 52 point graph fixed, since that one looks nice, by taking
(52^(1.2)/52)*6.4=14.1
. You using this technique you could try other powers besides 1.2 to see what you visually get.Dan
我投票赞成你的结果的质量和你的文章的清晰度。我希望我能提供一个可以改进您已经出色的工作的答案。
我担心这可能是一个对趋势权重进行反复试验的问题,直到您看到更合适的效果为止。
您也可以将其作为用户的输入:允许他们在给定现实约束的情况下摆弄该值,直到他们获得满意的值。
我还想知道每个图表的权重是否不同,因为每个图表中的点数不同。您是否正在尝试获得适用于所有图表的单一权重?
佳作;这是一个很好的问题。干得好。我希望我能更有帮助。也许其他人会比我有更多的智慧可以传授。
I voted this up for the quality of your results and the clarity of your write-up. I wish I could offer an answer that could improve on your already excellent work.
I fear that it might be a matter of trial and error with the trend weight until you see an improved fit.
It could be that you could make this an input from users as well: allow them to fiddle with the value, given realistic constraints, until they get satisfactory values.
I also wondered if the weight would be different for each graph, since the number of points in each is different. Are you trying to get a single weighting that works for all graphs?
Excellent work; a nice question. Well done. I wish I was more helpful. Perhaps someone else will have more wisdom to impart than I do.
这四张图中的趋势线看起来可能很准确,但实际上却很不准确。 (这在左下角的乞求和右上角的开始处最为明显。我认为在寻找趋势线时您会想要使用不少于一半的点(尽管实际上您应该使用更多的点)超过一半)。虽然实际上您应该接近 1-1.5 范围,但我建议您为您的用户提供一个“趋势线的准确性”滑块。他们可以使用最准确的设置使用趋势权重 1,最不准确的设置使用
数据点数 +1
的权重,这将使用 0 点(有趣的是你总是向下舍入),并且,我认为,尽管您的统计软件可能不同,但会生成一条笔直的水平线。It might look like the trend lines are accurate in those 4 graphs but its really quite off. (This is best seen in the begging of the lower left one and the beginning of the upper right. I would think that you would want to use no less than half of your points when finding the trend line (though really you should use much more than half). I would suggest a Trend Weight of 2 at a maximum. Though really you ought to stick closer to the 1-1.5 range. Since it is arbitrary i would suggest you give your user an "accuracy of trend line" slider that they can use where the most accurate setting uses a trend weight of 1 and the least accurate uses a weight of
#of data points +1
. This would use 0 points (amusing you always round down) and, i would assume, though your statistics software might be different, will generate a strait horizontal line.