您在“现实世界”中使用 Big-O 复杂性评估吗?
最近在一次采访中,我被问到了几个与技术问题过程中出现的各种算法的 Big-O 相关的问题。 我不认为我在这方面做得很好......自从我参加编程课程以来的十年里,我们被要求计算算法的大O,我没有讨论过任何东西的“大O”我曾经参与过或设计过。 我曾与其他团队成员以及与我共事的架构师就代码的复杂性和速度进行过多次讨论,但我从未参与过在实际项目中实际使用 Big-O 计算的团队。 讨论总是“考虑到我们对数据的理解,是否有更好或更有效的方法来做到这一点?” 从来没有“这个算法的复杂度是多少”?
我想知道人们是否真的在现实世界中讨论过他们的代码的“Big-O”?
Recently in an interview I was asked several questions related to the Big-O of various algorithms that came up in the course of the technical questions. I don't think I did very well on this... In the ten years since I took programming courses where we were asked to calculate the Big-O of algorithms I have not have one discussion about the 'Big-O' of anything I have worked on or designed. I have been involved in many discussions with other team members and with the architects I have worked with about the complexity and speed of code, but I have never been part of a team that actually used Big-O calculations on a real project. The discussions are always "is there a better or more efficient way to do this given our understanding of out data?" Never "what is the complexity of this algorithm"?
I was wondering if people actually have discussions about the "Big-O" of their code in the real world?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
与其说使用它,不如说你理解它的含义。
有些程序员没有意识到使用 O(N^2) 排序算法的后果。
我怀疑除了学术界的工作人员之外,还有很多人会在日常愤怒中使用大 O 复杂性分析。
It's not so much using it, it's more that you understand the implications.
There are programmers who do not realise the consequence of using an O(N^2) sorting algorithm.
I doubt many apart from those working in academia would use Big-O Complexity Analysis in anger day-to-day.
没有不必要的n方
根据我的经验,你不会对此进行太多讨论,因为它不需要讨论。 在实践中,根据我的经验,所发生的一切都是你发现某些东西很慢,并发现它是 O(n^2),而实际上它可能是 O(n log n) 或 O(n),然后你就去更改。 除了“这是 n 方,去修复它”之外没有任何讨论。
所以,是的,根据我的经验,你确实经常使用它,但只是在“降低多项式的阶数”的最基本意义上,而不是在“是的,但如果我们切换到这个疯狂的算法,我们'将从 logN 增加到阿克曼函数的反函数”或类似的废话。 任何小于多项式的东西,理论都会消失,你会切换到分析(例如,甚至在 O(n) 和 O(n log n) 之间做出决定,测量真实数据)。
No needless n-squared
In my experience you don't have many discussions about it, because it doesn't need discussing. In practice, in my experience, all that ever happens is you discover something is slow and see that it's O(n^2) when in fact it could be O(n log n) or O(n), and then you go and change it. There's no discussion other than "that's n-squared, go fix it".
So yes, in my experience you do use it pretty commonly, but only in the basest sense of "decrease the order of the polynomial", and not in some highly tuned analysis of "yes, but if we switch to this crazy algorithm, we'll increase from logN down to the inverse of Ackerman's function" or some such nonsense. Anything less than a polynomial, and the theory goes out the window and you switch to profiling (e.g. even to decide between O(n) and O(n log n), measure real data).
Big-O 表示法相当理论化,而在实践中,您更感兴趣的是实际的分析结果,这些结果为您提供了有关性能的确切数字。
您可能有两种排序算法,按照书本的规定,它们具有
O(n^2)
和O(nlogn)
上限,但分析结果可能会显示更高效的算法有一些开销(这没有反映在您为其找到的理论界限中),并且对于您正在处理的特定问题集,您可能会选择理论上效率较低的排序算法。底线:在现实生活中,分析结果通常优先于理论运行时间范围。
Big-O notation is rather theoretical, while in practice, you are more interested in actual profiling results which give you a hard number as to how your performance is.
You might have two sorting algorithms which by the book have
O(n^2)
andO(nlogn)
upper bounds, but profiling results might show that the more efficient one might have some overhead (which is not reflected in the theoretical bound you found for it) and for the specific problem set you are dealing with, you might choose the theoretically-less-efficient sorting algorithm.Bottom line: in real life, profiling results usually take precedence over theoretical runtime bounds.
我一直这样做。 当您必须处理“大”数字时,通常在我的例子中:用户、数据库中的行、促销代码等,您必须了解并考虑算法的 Big-O。
例如,生成随机促销代码进行分发的算法可用于生成数十亿个代码...使用 O(N^2) 算法生成唯一代码意味着数周的 CPU 时间,而 O(N) 意味着数小时。
另一个典型的例子是代码中的查询(糟糕!)。 人们查找表,然后对每一行执行查询...这会将顺序提高到 N^2。 通常可以更改代码以正确使用 SQL 并获取 N 或 NlogN 的订单。
因此,根据我的经验,只有在使用正确的算法类别之后,分析才有用。 我使用分析来捕获不良行为,例如了解为什么“小”数量限制的应用程序性能不佳。
I do, all the time. When you have to deal with "large" numbers, typically in my case: users, rows in database, promotion codes, etc., you have to know and take into account the Big-O of your algorithms.
For example, an algorithm that generates random promotion codes for distribution could be used to generate billions of codes... Using a O(N^2) algorithm to generate unique codes means weeks of CPU time, whereas a O(N) means hours.
Another typical example is queries in code (bad!). People look up a table then perform a query for each row... this brings up the order to N^2. You can usually change the code to use SQL properly and get orders of N or NlogN.
So, in my experience, profiling is useful ONLY AFTER the correct class of algorithms is used. I use profiling to catch bad behaviours like understanding why a "small" number bound application under-performs.
从我个人的经验来看,答案是——不。可能的原因是我只使用简单、易于理解的算法和数据结构。 他们的复杂性分析在几十年前就已经完成并发布了。 Rob Pike 此处更好地解释了为什么我们应该避免花哨的算法。 简而言之,从业者几乎不需要发明新算法,因此几乎不需要使用 Big-O。
嗯,这并不意味着您不应该精通 Big-O。 一个项目可能需要设计和分析一种全新的算法。 对于一些现实世界的例子,请阅读 Skiena 的算法设计手册中的“战争故事”。
The answer from my personal experience is - No. Probably the reason is that I use only simple, well understood algorithms and data structures. Their complexity analysis is already done and published, decades ago. Why we should avoid fancy algorithms is better explained by Rob Pike here. In short, a practitioner almost never have to invent new algorithms and as a consequence almost never have to use Big-O.
Well that doesn't mean that you should not be proficient in Big-O. A project might demand the design and analysis of an altogether new algorithm. For some real-world examples, please read the "war stories" in Skiena's The Algorithm Design Manual.
我尝试推迟优化,直到分析数据证明需要优化。 当然,除非在设计时很明显一种算法比其他选项更有效(不会给项目增加太多复杂性)。
I try to hold off optimizations until profiling data proves they are needed. Unless, of course, it is blatently obvious at design time that one algorithm will be more efficient than the other options (without adding too much complexity to the project).
是的,我用它。 不,它并不经常被“讨论”,就像我们不经常讨论“orderCount”或“xyz”是否是更好的变量名一样。
通常,您不会坐下来分析它,而是根据您所知道的内容形成直觉,并且几乎可以即时估计
O
复杂度在多数情况下。当我必须执行大量列表操作时,我通常会考虑一下。 我是否在做任何不必要的
O(n^2)
复杂性的事情,这些事情本来可以在线性时间内完成? 我对列表进行了多少遍? 这不是你需要进行正式分析的东西,但如果不了解大 O 表示法,准确地分析就会变得更加困难。如果您希望您的软件在较大的输入大小上具有可接受的性能,那么您需要正式或非正式地考虑算法的大 O 复杂性。 分析非常适合告诉您程序现在的执行情况,但如果您使用
O(2^n)
算法,您的分析器会告诉您一切都只是只要你的输入尺寸很小就可以。 然后你的输入大小就会增加,运行时间就会爆炸。人们常常认为大 O 表示法是“理论上的”、“无用的”或“不如分析重要”。 这只是表明他们不明白 big-O 复杂性的用途。 它解决的问题与分析器不同。 两者对于编写具有良好性能的软件都是必不可少的。 但分析最终是一个反应工具。 一旦问题存在,它就会告诉您问题出在哪里。。
Big-O 复杂性会主动告诉您,如果您在较大的输入上运行代码,哪些部分将会崩溃。 探查器无法告诉您这一点。
Yes, I use it. And no, it's not often "discussed", just like we don't often discuss whether "orderCount" or "xyz" is a better variable name.
Usually, you don't sit down and analyze it, but you develop a gut feeling based on what you know, and can pretty much estimate the
O
-complexity on the fly in most cases.I typically give it a moment's thought when I have to perform a lot of list operations. Am I doing any needless
O(n^2)
complexity stuff, that could have been done in linear time? How many passes am I making over the list? It's not something you need to make a formal analysis of, but without knowledge of big-O notation, it becomes a lot harder to do accurately.If you want your software to perform acceptably on larger input sizes, then you need to consider the big-O complexity of your algorithms, formally or informally. Profiling is great for telling you how the program performs now, but if you're using a
O(2^n)
algorithm, your profiler will tell you that everything is just fine as long as your input size is tiny. And then your input size grows, and runtime explodes.People often dismiss big-O notation as "theoretical", or "useless", or "less important than profiling". Which just indicates that they don't understand what big-O complexity is for. It solves a different problem than a profiler does. Both are essential in writing software with good performance. But profiling is ultimately a reactive tool. It tells you where your problem is, once the problem exists.
Big-O complexity proactively tells you which parts of your code are going to blow up if you run it on larger inputs. A profiler can not tell you that.
不,我不会在“现实世界”的情况下使用 Big-O 复杂性。
我对整个问题的看法是这样的 - (也许是错误的……但这只是我的看法。)
Big-O 复杂性最终是为了理解算法的效率。 如果通过经验或其他方式,您了解您正在处理的算法,并且能够在正确的地方使用正确的算法,这才是最重要的。
如果您了解这些 Big-O 东西并且能够正确使用它,那就太好了。
如果您不知道以数学方式谈论算法及其效率(Big-O 的东西),但您知道真正重要的是什么(在某种情况下使用的最佳算法),那完全没问题。
如果你也不知道,那就不好了。
No. I don't use Big-O complexity in 'real world' situations.
My view on the whole issue is this - (maybe wrong.. but its just my take.)
The Big-O complexity stuff is ultimately to understand how efficient an algorithm is. If from experience or by other means, you understand the algorithms you are dealing with, and are able to use the right algo in the right place, thats all that matters.
If you know this Big-O stuff and are able to use it properly, well and good.
If you don't know to talk about algos and their efficiencies in the mathematical way - Big-O stuff, but you know what really matters - the best algo to use in a situation - thats perfectly fine.
If you don't know either, its bad.
尽管您很少需要对一段代码进行深入的大分析,但了解它的含义并能够快速评估您正在编写的代码的复杂性及其可能产生的后果非常重要。
在开发时,您常常会觉得它“足够好”。 呃,没有人会在这个数组中放入超过 100 个元素,对吗? 然后,有一天,有人会在数组中放入 1000 个元素(相信用户:如果代码允许,他们中的一个就会这样做)。 而现在已经足够好的 n^2 算法却是一个很大的性能问题。
有时反过来也很有用:如果您知道功能上必须进行 n^2 次操作,并且算法的复杂性恰好是 n^3,那么您可能可以采取一些措施使其达到 n^2。 一旦达到 n^2,您就必须进行较小的优化。
相反,如果你刚刚编写了一个排序算法并发现它具有线性复杂度,那么你可以确定它有问题。 (当然,在现实生活中,你必须编写自己的排序算法的情况很少见,但我曾经在一次采访中看到有人明显对他的单个 for 循环排序算法感到满意)。
Although you rarely need to do deep big-o analysis of a piece of code, it's important to know what it means and to be able to quickly evaluate the complexity of the code you're writing and the consequences it might have.
At development time you often feel like it's "good enough". Eh, no-one will ever put more than 100 elements in this array right ? Then, one day, someone will put 1000 elements in the array (trust users on that: if the code allows it, one of them will do it). And that n^2 algorithm that was good enough now is a big performance problem.
It's sometimes usefull the other way around: if you know that you functionaly have to make n^2 operations and the complexity of your algorithm happens to be n^3, there might be something you can do about it to make it n^2. Once it's n^2, you'll have to work on smaller optimizations.
In the contrary, if you just wrote a sorting algorithm and find out it has a linear complexity, you can be sure that there's a problem with it. (Of course, in real life, occasions were you have to write your own sorting algorithm are rare, but I once saw someone in an interview who was plainly satisfied with his one single for loop sorting algorithm).
是的,对于服务器端代码,一个瓶颈可能意味着您无法扩展,因为无论您为解决问题投入多少硬件,您的回报都会递减。
话虽这么说,可扩展性问题通常还有其他原因,例如文件和网络访问的阻塞,这比您将看到的任何内部计算都要慢得多,这就是为什么分析比 BigO 更重要。
Yes, for server-side code, one bottle-neck can mean you can't scale, because you get diminishing returns no matter how much hardware you throw at a problem.
That being said, there are often other reasons for scalability problems, such as blocking on file- and network-access, which are much slower than any internal computation you'll see, which is why profiling is more important than BigO.
据我所知,三个嵌套的 for 循环可能比一个嵌套的 for 循环更糟糕。 换句话说,我用它作为参考直觉。
我从未在学术界之外计算过算法的 Big-O。 如果我有两种方法来解决某个问题,如果我的直觉告诉我其中一种方法的 Big-O 值低于另一种方法,我可能会本能地采用较小的方法,而无需进一步分析。
另一方面,如果我肯定知道进入我的算法的n的大小,并且我肯定知道它是相对的小(比如,少于 100 个元素),我可能会选择最清晰的一个(我想知道我的代码在编写一个月后做了什么)。 毕竟,使用当今计算机的用户很难注意到 100^2 和 100^3 执行之间的差异(除非另有证明)。
但是,正如其他人指出的那样,探查器有最后的明确的说法:如果我编写的代码执行缓慢,我比任何理论规则更信任探查器,并进行相应的修复。
To the extent that I know that three nested
for
-loops are probably worse than one nestedfor
-loop. In other words, I use it as a reference gut feeling.I have never calculated an algorithm's Big-O outside of academia. If I have two ways to approach a certain problem, if my gut feeling says that one will have a lower Big-O than the other one, I'll probably instinctively take the smaller one, without further analysis.
On the other hand, if I know for certain the size of n that comes into my algorithm, and I know for certain it to be relatively small (say, under 100 elements), I might take the most legible one (I like to know what my code does even one month after it has been written). After all, the difference between 100^2 and 100^3 executions is hardly noticeable by the user with today's computers (until proven otherwise).
But, as others have pointed out, the profiler has the last and definite word: If the code I write executes slowly, I trust the profiler more than any theoretical rule, and fix accordingly.