PHP 中的曲线拟合
我有一个名为 Today_stats 的 MySql 表。它有 ID、日期和点击次数。我正在尝试创建一个脚本来获取值并尝试预测接下来 7 天的点击次数。我如何在 PHP 中预测它?
I have a MySql table called today_stats. It has got Id, date and clicks. I'm trying to create a script to get the values and try to predict the next 7 days clicks. How I can predict it in PHP?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这里描述了不同类型的曲线拟合:
http://en.wikipedia.org/wiki/Curve_fitting
另外:http://www.qub.buffalo.edu/wiki/index.php/Curve_Fitting
Different types of curve fitting described here:
http://en.wikipedia.org/wiki/Curve_fitting
Also: http://www.qub.buffalo.edu/wiki/index.php/Curve_Fitting
这与 PHP 关系不大,而与数学关系更大。计算此类数据的最简单方法是获取过去 X 周内某一天的平均流量。您不想提取所有数据,因为流行趋势和页面内容会发生变化。
例如,获取上个月每天的平均流量。通过将估算与实际流量进行比较,您将能够知道估算的准确性。如果它们根本不准确,请尝试进行计算(例如,更改采样的时间段)。或者,您的估计有所偏差可能是一件好事:您的网站刚刚登上了《纽约时报》的头版!
干杯。
This has less to do with PHP, and more to do with math. The simplest way to calculate something like this is to take the average traffic for a given day over the past X weeks. You don't want to pull all the data, because fads and page content changes.
So, for example, get the average traffic for each day over the last month. You'll be able to tell how accurate your estimates are by comparing them to actual traffic. If they aren't accurate at all, then try playing with the calculation (ex., change the time period you're sampling from). Or maybe it's a good thing that your estimate is off: your site was just featured on the front page of the New York Times!
Cheers.
您正在寻找的算法称为最小二乘法
您需要做的是最小化总和从每个点到将用于预测未来值的函数的距离。为了使距离始终为正,计算时不考虑绝对值,而是计算该值的平方。差值的平方和必须最小。通过定义组成该总和的函数,对其进行推导,求解所得方程,您将找到最接近过去统计值的函数参数。
像 Excel 这样的程序(也可能是 OpenOffice Spreadsheet)有一个内置函数可以为您执行此操作,使用多项式函数来定义依赖性。
基本上,您应该将时间作为独立值,将所有其他值作为描述值。
这被称为计量经济学,因为它在经济学中广泛存在。这样,如果您有大量过去的统计数据,那么对第二天的预测将相当准确(您还将能够确定信任区间 - 可能发生的错误)。接下来的日子会越来越不准确。
如果您为一周中的每一天制作不同的模型,包括假期和特殊日期作为变量,您将获得更高的精度。
这是以数学方式预测未来价值的唯一正确方法。但由此产生了一个问题:这真的值得吗?
The algorithm you are looking for is called Least Squares
What you need to do is minimize the summed up distances from each point to the function you will use to predict the future values. For the distance to be always positive, not the absolute value is taken into calculation, but the square of the value. The sum of the squares of the differences has to be minimum. By defining the function that makes up that sum, deriving it, solving the resulting equation, you will find the parameters for your function, that will be CLOSEST to the statistical values from the past.
Programs like Excel (maybe OpenOffice Spreadsheet too) have a built-in function that does this for you, using polynomial functions to define the dependence.
Basically you should take Time as the independent value, and all the others as described values.
This is called econometrics, because its widespread in economics. This way, if you have a lot of statistical data from the past, the prediction for the next day will be quite accurate (you will also be able to determine the trust interval - the possible error that may occur). The following days will be less and less accurate.
If you make different models for each day of week, include holidays and special days as variables, you will get a much higher precision.
This is the only RIGHT way to mathematically forecast future values. But from all this a question arises: Is it really worth it?
首先连接到数据库,然后检索 x 天前的数据。
然后,您可以尝试为前几天制作一条最适合的线,然后使用它并扩展到未来。但根据应用的不同,最适合的线还不够好。
Start off by connecting to the database and then retrieving the data for x days previously.
Then you could attempt to make a line of best fit for the previous days and then just use that and extend into the future. But depending on the application, a line of best fit isn't going to be good enough.
一个简单的方法是按天分组并对每个值求平均值。这一切都可以在 SQL 中完成
a simple approach would be to group by days and average each value. This can all be done in SQL