C++无序的set字符串哈希时间复杂性?
为什么插入到集合中的最坏情况的复杂性是容器大小的线性常数,而不是元素本身的大小?
我专门谈论字符串。如果我有一个大小M的字符串集,那么如果我插入了一个新的大小X字符串,我认为插入操作将需要读取大小X的字符串以计算键?那我们为什么不考虑那个时候呢?
如果还有另一个大小为1000*x的字符串,那么在最坏的情况下,插入仍然会占M大小? 无论字符串的大小如何,时间是0(m)?如何?
Why the worst case complexity of insert into set is linear constant of size of container and not the size of element itself?
I am specifically talking about string. If I have a string set of size m, then if I insert a new string of size x, I assume that the insert operation will need to read string of size x in order to calculate key? So why dont we consider that time?
If there is another string that is of size 1000*x, then insertion will still take m size in worst case?
Regardless of size of string, the time is 0(m)? How?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一个很好的问题,它以分析哈希表上的运营成本的方式打击了一些细微差别。
实际上,有几种不同的方法可以考虑这一点。第一个是考虑一下纯粹关注表尺寸的运行时视角测量的哈希表上操作的运行时间。从这个角度来看,插入,查找或删除的成本不会扩展作为表元素数量的函数。也就是说,查找某事的成本不取决于表中的元素数量,插入元素的成本不取决于表中的元素数量等。 n 表示表中的元素数量,然后插入,删除或查找的成本为O(1),因为对 n 没有依赖性。
从这个意义上讲,这里的BIG-O符号被解释为“如果唯一的变量是 n ,则表中的元素数量,事物将如何扩展?”但这有很多不足之处,因为它完全忽略了比较平等的字符串的成本,评估哈希功能等的成本
。从带有 n 元素的哈希表中的长度 m 插入或删除一串长度为o( m ),而不是o(1) 。
我一直发现最有帮助的观点是以下内容。当哈希表说所有操作都在时间o(1)中运行时,真正的含义是每个操作仅需要o(1)总哈希评估和比较。从这个意义上讲,这意味着“查找某些东西,插入某物或删除某些东西的成本是一定数量的哈希评估和比较。”为了计算总成本,然后您将O(1)乘以哈希或比较的成本,在长度为m m 的字符串的情况下为O( m )。这为您提供了与您的直觉相匹配的O( m )的整体运行时。
This is a great question and it hits at some nuances in the way we analyze the cost of operations on hash tables.
There are actually several different ways to think about this. The first one is to think about the runtimes of the operations on a hash table measured with a runtime perspective that purely focuses on the size of the table. From that perspective, the cost of an insertion, lookup, or deletion does not scale as a function of the number of table elements. That is, the cost of looking something up does not depend on the number of elements in the table, the cost of inserting an element doesn't depend on the number of elements in the table, etc. From that vantage point, if we let n denote the number of elements in the table, then the cost of an insertion, deletion, or lookup is O(1), because there is no dependency on n.
In this sense, the big-O notation here is meant to be interpreted as "if the only variable is n, the number of elements in the table, how will things scale?" But that leaves a lot to be desired, since it completely ignores the cost of comparing strings for equality, the cost of evaluating the hash function, etc.
If you factor those details in, then yes, you're correct - the cost of looking up, inserting, or deleting a string of length m from a hash table with n elements is O(m), not O(1).
The perspective I've always found most helpful is the following. When a hash table says that all operations run in time O(1), what it really means is that each operation requires only O(1) total hash evaluations and comparisons. In that sense, it means "the cost of looking something up, or inserting something, or deleting something is some constant number of hash evaluations and comparisons." To then work out the total cost, you then multiply O(1) by the cost of a hash or comparison, which in the case of strings of length m would be O(m). That gives you the overall runtime of O(m), which matches your intuition.
该标准在元素类型上的原始操作方面给出了集装箱中元素数量的复杂性。这些原始操作是由实例化提供的。
]
如果您需要在其他操作(例如小整数算术操作)方面复杂性,则如果您知道这些术语中元素类型上每个操作的复杂性,则可以自己得出它。
The standard gives the complexity as a function of the number of elements in the container in terms of primitive operations on the element type. These primitive operations are supplied by the instantiation.
For example here's how the complexity of
operator==
on unordered associative containers is described:If you require complexity in terms of other operations, such as small integer arithmetic operations, you can derive it yourself if you know the complexity of each operation on the element type in those terms.