Python设置了纪念测试
在算法课程中,我们的老师涵盖了“虚拟初始化”,您可以为操作分配内存,但不要初始化所有值,因为与实际需要计算的值相比,问题空间可能太大了(例如,字典或设置),这会浪费大量时间来设置默认值。基本原理是要有两个数组相互指向(相互索引),并跟踪已分配了多少变量。假设我们有一个数组a,我们想找到[i]是否包含有效值,我们可以使用数组B作为类似的索引:
我看了
如何使用哈希表来检查值是否有效?我们可以哈希亚杂志并读取结果,但这并不意味着输出不是垃圾(我们可以在调用Malloc时阅读写给该地址的所有内容)。
虚拟初始化完全避免了哈希表中碰撞的问题,所以为什么它不是使用哈希表更好的解决方案?
In an algorithmics course our teacher covered "virtual initialization" where you allocate memory for an operation, but don't initialize all the values since the problem space might be too large compared to the values that actually need to be calculated (for example a dictionary or set), which wastes a lot of time for setting a default value. The basic principle is to have two arrays that point to each other (index each other) and keep track of how many variables have been assigned. let's say we have an array a and we want to find if a[i] contains a valid value, we can use array b as an index to a like so:
I had a look at the python time complexity table at https://wiki.python.org/moin/TimeComplexity and it mentions that a set membership test in the worst case can be of O(n). I'm not sure where to find the exact implementation of the set function but after a bit of googling, most people mention that it uses a hash table. My main questions are:
How can a hash table be used to check if a value is valid or not? We can hash every value and read the result, but that doesn't mean the output isn't garbage (we can be reading whatever was written to that address when malloc was called).
virtual initialization completely avoids the problem with collisions in a hash table, so why is it not a better solution than using a hash table?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
当您使用数组实现哈希表时,每个条目中都需要一个标志来指示当前是否正在使用。这是处理哈希函数碰撞所需的 - 如果两个值具有相同的哈希代码,则不能将它们都放在数组的同一元素中。
这意味着,当您分配数组时,必须仔细阅读所有这些标志。而且,每当您种植哈希表时,您都必须再次执行此操作。
“虚拟初始化”避免了这种情况。您粘贴的算法用于判断是否正在使用
a [k]
。它使用第二个数组b
将索引包含在a
中。插入元素时,a [k] .p
在b
中包含索引j
,和b [j]
包含k
。如果这两个匹配以及j
低于所有索引的限制,则您知道正在使用的数组元素。要清除条目,您只需设置
a [k] .p = 0
,因为所有有效的条目均具有p
1 和之间n
。这是算法设计中时间空间折衷的一个示例。为了避免在初始化数组初始化的时间,我们分配了第二个数组。如果您有很多可用的内存,这可能是可以接受的权衡。
When you implement a hash table using an array, you need a flag in each entry to indicate whether it's currently in use. This is needed to deal with hash function collisions -- if two values have the same hash code, you can't put them both in the same element of the array.
This means that when you allocate the array, you have to go through it, initializing all these flags. And you have to do this again whenever you grow the hash table.
"Virtual initialization" avoids this. The algorithm you pasted is used to tell whether an
a[k]
is in use. It uses a second arrayb
that contains indexes intoa
. When inserting an element,a[k].p
contains an indexj
inb
, andb[j]
containsk
. If these two match, and alsoj
is lower than the limit of all indexes, you know that the array element is in use.To clear an entry, you simply set
a[k].p = 0
, since all valid entries havep
between1
andn
.This is an example of a time-space tradeoff in algorithm design. To avoid the time spent initializing the array, we allocate a second array. If you have lots of available memory, this can be an acceptable tradeoff.