在 Perl 中非常快速地查找:可以重新加载哈希值吗?
我有大约 1 亿行,例如:
A : value of A
B : value of B
|
|
|
Z : value of Z upto 100 million unique entries
目前,每次运行程序时,我都会将整个文件作为哈希加载,这需要一些时间。在运行时,我需要访问 A,B 的值,因为我知道 A,B 等。
我想知道是否可以创建一次哈希值并将其存储为二进制数据结构或索引文件。在 Perl 中用最少的编程就能实现什么。
谢谢! -阿比
I have about 100 million rows such as:
A : value of A
B : value of B
|
|
|
Z : value of Z upto 100 million unique entries
Currently each time I run my program I load the entire file as a hash which takes some time. During the run time I need access to value of A,B given I know A,B etc.
I am wondering if I can make a hash once and store it as a binary data structure or index the file. What would be possible in in perl with least programming.
Thanks!
-Abhi
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我建议使用磁盘上的键/值数据库。由于 Perl 的 tie 函数,它们可以与普通的内存中哈希相同地使用。如果您的散列非常大,它们的读/写速度将比 Perl 的散列更快,并且它们支持自动保存/加载到磁盘。
BerkeleyDB 是一个老最爱:
对数据库的更改会自动保存到磁盘,并通过多次调用持续存在你的脚本。
检查 perldoc 的选项,但最重要的是:
更复杂但更快的数据库库是 Tokyo Cabinet ,当然还有很多其他选择(毕竟这是 Perl...)
I suggest an on-disk key/value database. Due to Perl's tie function, they can be used identically to normal, in-memory hashes. They'll be faster than Perl's hashes for reading/writing if your hash is very large, and they support saving/loading to disk automatically.
BerkeleyDB is an old favourite:
Changes to the database are automatically saved to disk and will persist through multiple invocations of your script.
Check the perldoc for options, but the most important are:
A more complex but much faster database library would be Tokyo Cabinet, and there are of course many other options (this is Perl after all...)
看看Storable - 它应该做你想做的事情并且使用起来非常简单:
这只在你的程序是当然,实际上受到 CPU 速度的限制。由于您的数据结构非常简单,因此您解析它的速度可能比从磁盘读取它的速度快。在这种情况下,可存储不会对您有太大帮助。
Have a look at Storable - it should do what you want and is extremely simple to use:
This only helps if your program is actually limited by CPU speed, of course. Since your data structure is very simple, you may be parsing it faster than you can read it from disk. Storable isn't going to help you much in that case.
我建议使用 Tie::File 因为它包含在核心中,也不是将整个数据结构加载到内存中,但根据需要从磁盘访问各个记录。
I recommend using Tie::File as it is included in the core, as well as not loading your entire data structure into memory, but accessing individual records as needed from the disk.