用1MB空间对1000万个整数进行排序解决方案解释 - 编程珍珠
我正在阅读“编程珍珠”,我对其中一个解决方案的解释感到非常困惑。
问题是: “一个文件最多包含 n 个正整数,每个正整数都小于 n,其中 n = 10^7。每个正整数最多可以出现十次。您将如何对文件进行排序?”
书中给出的解决方案: 如果每个整数最多出现十次,那么我们可以在四位半字节中计算它的出现次数。使用问题 5(如下)的解决方案,我们可以一次性对整个文件进行 10,000,000/2 字节的排序,或者在 k 遍中使用 10,000,000/2k 字节”
问题 5 的解决方案是:两遍算法首先对整数 0 到 0 进行排序4,999,999 使用 5,000,000/8 = 625,000 个存储字,然后在第二遍中对 5,000,000 到 9,999,999 进行排序。 k-pass 算法在时间 kn 和空间 n/k 上对最多 n 个小于 n 的非重复正整数进行排序。)
我不明白作者如何对 10,000,000/2k 空间进行排序。我的意思是,基于问题 5 的解决方案,首先我们需要 625K 字节的空间来在第一遍中进行排序,并且每个整数需要额外的 1/2 字节来存储计数,对吗?
有人可以帮我理解这一点吗?
多谢。
I was reading "Programming Pearls" and I am really confused in one of the solution explanations.
The question was:
"A file containing at most n positive integers, each less than n, where n = 10^7. Each positive integer could appear at most ten times. How would you sort the file?"
Given solution in the book:
" If each integer appears at most ten times, then we can count its occurence in a four-bit half byte. Using the solution to Problem 5 (below) we can sort the complete file in a single pass with 10,000,000/2 bytes, or in k passes with 10,000,000/2k bytes"
Solution to problem 5 is: A two-pass algorithm first sorts the integers 0 through 4,999,999 using 5,000,000/8 = 625,000 words of storage, then sorts 5,000,000 through 9,999,999 in a second pass. A k-pass algorithm sorts at most n non-repeated positive integers less than n in time kn and space n/k.)
I am not getting how author is coming to 10,000,000/2k space to sort. I mean, based on the solution to problem 5, first we need 625K bytes of space to sort in first pass and additional 1/2 byte per integer to store the count right?
Could someone please help me understand this?
Thanks a lot.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
为此,每个计数器可以使用 4 位,而不是 2 个字节。如果您对计数器进行分组,您甚至可以降低该值,例如,如果您对 3 个计数器进行分组,则为 10*10*10=1000 个组合,并且您需要 10 位(= 1024 个值)。
You can use 4 bits per counter for this, not 2 bytes. If you group counters you can even lower this value, for example if you group 3 counters, that's 10*10*10=1000 combinations and you need 10 bits (=1024 values) for that.
10,000,000 - 因为有 10,000,000 个可能的值。
2 - 因为每个字节由两个半字节组成。
10,000,000 - because there are 10,000,000 possible values.
2 - because each byte consists of two half-bytes.