Perl 多哈希与单哈希
我想读取并处理文件中的输入集,然后将其打印出来。 我需要使用 3 个键来存储数据。 假设 3 个键是 k1, k2, k3
以下哪一个会提供更好的性能
$hash{k1}->{k2}->{k3} = $val;
或者
$hash{"k1,k2,k3"} = $val;
对于我之前的问题,我得到的答案是所有 Perl 哈希键都被视为字符串。
I want to read and process sets of input from a file and then print it out.
There are 3 keys which I need to use to store data.
Assume the 3 keys are k1, k2, k3
Which of the following will give better performance
$hash{k1}->{k2}->{k3} = $val;
or
$hash{"k1,k2,k3"} = $val;
For my previous question I got the answer that all perl hash keys are treated as strings.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
除非您确实要处理大型数据集,否则请使用生成更清晰代码的任何一个。我可能是错的,但这有点过早优化的味道。
如果不是,这可能取决于可能的键的范围。如果排序不是问题,请按顺序排列数据,使
k1
是最小的键集,k3
是最大的键集。我怀疑这样你会在哈希上使用更少的内存。根据您的数据集,甚至可以谨慎地预先调整哈希值的大小(我认为%hash = 100
可以解决问题)。至于哪个更快,只有分析才能告诉我们。两者都尝试一下并亲自看看。
另请注意,
$hash{k1}->{k2}-{k3}
是不必要的。您可以编写$hash{k1}{k2}{k3}
。取消引用不需要在方括号或花括号之间。Unless you're really dealing with large datasets, use whichever one produces cleaner code. I may be wrong but this reeks of premature optimization.
If it isn't, this may depend on the range of possible keys. If ordering is not an issue, arrange your data in order so that
k1
is the smallest set of keys andk3
is the largest. I suspect you'll use less memory on hashes that way. Depending on your datasets it may even be prudent to presize your hashes (I think%hash = 100
does the trick).As to which is faster, only profiling will tell. Try both and see for yourself.
Also, note that
$hash{k1}->{k2}-{k3}
is unnecessary. You can write$hash{k1}{k2}{k3}
. Dereferences aren't needes in between brackets, either square or curly.哈希查找速度与哈希中的项数无关,因此仅执行一次哈希查找的版本将比执行三次哈希查找的版本更快地执行操作的哈希查找部分。但是,另一方面,单查找版本必须将三个键连接成一个字符串,然后才能将它们用作组合键;如果这个字符串是匿名的(例如,
$hash{"$a,$b,$c"}
),这可能会涉及一些有趣的东西,比如内存分配。总的来说,我希望串联速度足够快,在大多数情况下,一次查找版本会比三次查找版本更快,但在您的情况下知道哪个版本更快的唯一方法是是在两种风格中编写相同的代码和 Benchmark 的区别。然而,就像其他人已经说过的那样,这是一个不成熟且毫无价值的微观优化。除非您知道自己存在性能问题(或者您有历史性能数据,表明问题正在发展并且将在不久的将来出现在您身上)并且您已经分析了您的代码以确定该哈希值查找是导致性能问题的原因,你担心这个是浪费时间。哈希查找很快。这几乎不是一个真正的基准,但是:
在这个微不足道的(无可否认,有很大缺陷的)示例中,我得到的速率相当于每秒大约 1100 万次哈希查找。在您提出问题的时间内,您的计算机可能已经完成了数亿次(如果不是数十亿次)哈希查找。
以应用程序中最具可读性和最可维护性的任何样式编写哈希查找。如果您尝试尽可能快地优化它,那么浪费的程序员时间将比您希望通过优化节省的任何处理时间大(很多!)数量级。
Hash lookup speed is independent of the number of items in the hash, so the version which only does one hash lookup will perform the hash lookup portion of the operation faster than the version which does three hash lookups. But, on the other hand, the single-lookup version has to concatenate the three keys into a single string before they can be used as a combined key; if this string is anonymous (e.g.,
$hash{"$a,$b,$c"}
), this will likely involve some fun stuff like memory allocation. Overall, I would expect the concatenation to be quick enough that the one-lookup version would be faster than the three-lookup version in most cases, but the only way to know which is faster in your case would be to write the same code in both styles and Benchmark the difference.However, like everyone else has already said, this is a premature and worthless micro-optimization. Unless you know that you have a performance problem (or you have historical performance data which shows that a problem is developing and will be upon you in the near future) and you have profiled your code to determine that hash lookups are the cause of your performance problem, you're wasting your time worrying about this. Hash lookups are fast. It's hardly a real benchmark, but:
In this trivial (and, admittedly, highly flawed) example, I got a rate equivalent to roughly 11 million hash lookups per second. In the time you spent asking the question, your computer could have done hundreds of millions, if not billions, of hash lookups.
Write your hash lookups in whatever style is most readable and most maintainable in your application. If you try to optimize this to be as fast as possible, the wasted programmer time will be (many!) orders of magnitude larger than any processing time that you could ever hope to save with the optimizations.
如果您有内存问题,我建议您在开发的早期阶段使用 CPAN 中的
Devel::Size
来获取两种替代方案的大小。否则,请使用对您来说友好的那个!
If you have memory concerns I would suggest use
Devel::Size
from CPAN in a early fase of development to get the size of both alternatives.Otherwise use the one which seems friendly for you!