确定一组数据是来自线性函数还是对数函数?
我有一组数据点,很好奇这些数据代表线性函数还是对数函数。
数据集是二维的。
假设函数 f(x) = x 遵循一组理想的数据点。如果我绘制数据点,我就能看出它是线性的。
同样,如果数据点遵循函数 f(x) = log(x),我将能够直观地看出它是对数的。
另一方面,让程序确定一组数据是线性的还是对数的并非易事。我该如何处理这个问题?
I have a set of data points and am curious if the data represents a linear function or a logarithmic function.
The data set is 2 dimensional.
Let's say an ideal set of data points followed the function f(x) = x. If I plotted the data point I would be able to tell it is linear.
Similarly if the data points followed the function f(x) = log(x), I would be able to visually tell it is logarithmic.
On the other hand, having the program determine if a set of data is linear or logarithmic is nontrivial. How would I approach this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
一种选择是对数据集进行线性回归以获得最佳拟合线。如果数据是线性的,您将得到非常好的拟合,并且均方误差应该很低。否则,您将得到一个合适的拟合值和一个合理的误差。
或者,您可以考虑通过转换每个点 (x0, x1, ..., xn, y) 来转换数据集到 (x0, x1, ..., xn, ey)。如果数据是线性的,现在它将是指数的,如果数据是对数的,现在它将是线性的。现在运行线性回归并获取均方误差,对数数据的误差很小,而线性数据的误差却大得惊人,因为指数函数膨胀得非常快。
要实际实现回归,一种选择是使用最小二乘回归。除了模型之外,这还有一个额外的好处,即为您提供相关系数,该系数也可用于区分两个数据集。
因为您询问了如何在 Java 中执行此操作,所以快速 Google 搜索出现了这段 Java 代码用于执行线性回归。然而,您可能更适合像 Matlab 这样专门针对执行此类查询进行优化的语言。 的一行代码来完成此回归。
例如,在 Matlab 中,您可以通过编写“希望这有帮助!”
One option would be to do a linear regression on the data set to get a best-fit line. If the data is linear, you'll get a very good fit and the mean squared error should be low. Otherwise, you'll get an okay fit and a reasonable error.
Alternatively, you could consider transforming the data set by converting each point (x0, x1, ..., xn, y) to (x0, x1, ..., xn, ey). If the data was linear, now it will be exponential, and if the data was logarithmic, now it will be linear. Running a linear regression and getting the mean-squared error now will have a low error for the logarithmic data and a staggeringly huge error for the linear data, since the exponential function blows up extremely quickly.
To actually implement the regression, one option would be to use a least-squares regression. This would have the added benefit of giving you a correlation coefficient in addition to the model, which could also be used to distinguish between the two data sets.
Because you've asked for how to do this in Java, a quick Google search turned up this Java code to do a linear regression. However, you might have a better fit in a language like Matlab that is specifically optimized to do these sorts of queries. For example, in Matlab, you can do this regression in one line of code by writing
Hope this helps!