从运行时间确定时间复杂度的最简单方法

发布于 2024-09-27 06:12:20 字数 355 浏览 6 评论 0原文

假设我正在尝试分析一个算法,而我所能做的就是使用不同的输入来运行它。我可以构造一组点 (x,y) 作为(样本大小,运行时间)。 我想动态地将算法分类为复杂性类别(线性、二次、指数、对数等) 理想情况下,我可以给出一个或多或少近似该行为的方程。 我只是不确定最好的方法是什么。

对于任何次数多项式,我都可以创建回归曲线并提出一些适合度的度量,但我真的不知道如何对任何非多项式函数执行此操作。这比较困难,因为我之前不知道我应该尝试适应什么形状。

这可能更像是一个数学问题而不是编程问题,但这对我来说非常有趣。我不是数学家,所以可能有一种更简单的既定方法可以从一组我不知道的点中获得合理的函数。有没有人有解决这样的问题的想法?是否有 C# 数字库可以帮助我处理数字?

Lets suppose I am trying to analyze an algorithm and all I can do is run it with different inputs. I can construct a set of points (x,y) as (sample size, run time).
I would like to dynamically categorize the algorithm into a complexity class (linear, quadratic, exponential, logarithmic, etc..)
Ideally I could give an equation that more or less approximates the behavior.
I am just not sure what the best way to do this is.

For any degree polynomial I can create regression curves and come up with some measure of fitness, but I don't really have a clue how I would do that for any nonpolynomial function. It is harder since I don't have any previous knowledge of what shape I should try to fit.

This may be more of a math question than a programming question, but it is very interesting to me. I'm not a mathematician, so there may be a simpler established method to get a reasonable function from a set of points that I just don't know about. Does anyone have any ideas for solving a problem like this? Is there a numerical library for C# that could help me crunch the numbers?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

千と千尋 2024-10-04 06:12:20

好吧,您真正关心的复杂性类别并不多,所以我们可以说:线性、二次、多项式(次数 > 2)、指数和对数。

对于其中的每一个,您都可以使用最大的 (x,y) 对来求解未知变量。让 y = f(x) 表示算法的运行时间,作为样本大小的函数。我们假设 f(1) = 0,如果不是,我们总是可以从每个 y 中减去该值 y(1),这只是消除了 f(x) 中的常数。让 y(end) 表示 (x,y) 数据集中 y 的最后一个(也是最大)值。

此时,我们可以求解每个规范形式中的未知数:

f(x) = c*x
f(x) = c*x^2
f(x) = x^c
f(x) = c^x
f(x) = log(x)/log(c)

由于每个方程中只有一个未知数,因此我们可以在任何点上求解它。考虑从随机次数> 的多项式生成的以下数据。 2:

x = [ 1 2 3 4 5 6 7 8 9 10 ];
y = [ 0 6 19 44 81 135 206 297 411 550 ];

如果我们使用最后一点来求解每种可能性的 c(假设这将是最小噪声估计),

550 = c*10    -> c = 55
550 = c*10^2  -> c = 5.5
550 = 10^c    -> c = log(550)/log(10) ~= 2.74
550 = c^10    -> c = 550^(1/10) ~= 1.88
550 = log(x)/log(c) -> c = 10^(1/550) ~= 1.0042

现在可以比较每个函数与剩余数据的拟合程度,这是一个图:

我们 我无法发布图像,因此请查看此处的情节:https://i.sstatic.net/UH6T8。 png

真实数据用红色星号显示,线性用绿线显示,二次用蓝色显示,多项式用黑色显示,指数用粉色显示,对数图用绿色显示,带 O。从残差中应该可以清楚地看出什么函数最适合您的数据。

Well there are not that many complexity classes you really care about, so let's say: linear, quadratic, polynomial (degree > 2), exponential, and logarithmic.

For each of these you could use the largest (x,y) pair to solve for the unknown variable. Let y = f(x) denote the runtime of your algorithm as a function of the sample size. Let's assume that f(1) = 0, and if it doesn't we can always subtract of that value y(1) from each of the y's, this just eliminates the constants in f(x). Let y(end) denote the last (and largest) value of y in your (x,y) data set.

At this point we can solve for the unknown in each canonical form:

f(x) = c*x
f(x) = c*x^2
f(x) = x^c
f(x) = c^x
f(x) = log(x)/log(c)

Since there is only a single unknown in each equation we can you any point to solve for it. Consider the following data generated from a polynomial of random degree > 2:

x = [ 1 2 3 4 5 6 7 8 9 10 ];
y = [ 0 6 19 44 81 135 206 297 411 550 ];

If we use the last point to solve for c for each possibility (assuming this would be the least noise estimate)

550 = c*10    -> c = 55
550 = c*10^2  -> c = 5.5
550 = 10^c    -> c = log(550)/log(10) ~= 2.74
550 = c^10    -> c = 550^(1/10) ~= 1.88
550 = log(x)/log(c) -> c = 10^(1/550) ~= 1.0042

We can now compare how well each of these functions fit the remaining data, here is a plot:

I'm new and I can't post images so look at the plot here: https://i.sstatic.net/UH6T8.png

The true data is shown in the red asterisk, linear with green line, quadratic in blue, polynomial in black, exponential in pink, and the log plot in green with O's. It should be pretty clear from the residuals what function fits your data the best.

兰花执着 2024-10-04 06:12:20

曲线拟合曾经是一门艺术,但现在在某种程度上已经颓废了:)(这对周围的物理学家来说是一个笑话)

已经取得了很多进展,这使得简单的凡人能够猜测(一些)不平凡的函数依赖关系。

我不会详细介绍这些方法和限制,而是建议您参考 eureqa,这是康奈尔大学开发的一个非常好的软件。

Eureqa(发音为“eureka”)是一款用于检测数据中的方程和隐藏数学关系的软件工具。其目标是确定可以描述生成数据的基本机制的最简单的数学公式。 Eureqa 可免费下载和使用。查找程序下载、视频教程、用户论坛以及其他参考资料。

我尝试了好几次eureqa,如果模型不是太复杂的话,效果非常好。我认为它足以区分多项式、对数和指数。

哈!

后记:

遗憾的是该软件不再免费了:(

Curve fitting used to be an art, but is now somehow decadent :) (That's a joke for the physicists around)

A lot of progress has been made, that allows simple mortals to guess (some) non trivial functional dependencies.

I'll not enter into a description of the methods and limitations, but instead I'll refer you to eureqa, which is a very nice piece of software developed at Cornell.

Eureqa (pronounced "eureka") is a software tool for detecting equations and hidden mathematical relationships in your data. Its goal is to identify the simplest mathematical formulas which could describe the underlying mechanisms that produced the data. Eureqa is free to download and use. Look for the program download, video tutorial, user forum, and other and reference materials.

I tried eureqa several times with very good results if the models are not too complicated. I think it is good enough for distinguishing between polynomials, logs and exponentials.

HTH!

Post Scriptum:

Regrettably the software isn't free anymore :(

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文