如何使用 Python 处理内存不足
我有大量的字典可供使用。超过 1000 万个单词经过哈希处理。它太慢了,有时会超出内存。
有没有更好的方法来处理这些庞大的数据结构?
I have huge dictionaries that I manipulate. More than 10 Million words are hashed. Its is too slow and some time it goes out of memory.
Is there a better way to handle these huge data structure ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的。它称为数据库。由于字典适合您(除了内存问题),我认为 sqlite 数据库适合您。您可以非常轻松地使用 sqlite3 并且它有很好的文档记录。
当然,只有当您可以将值表示为 json 之类的东西或者愿意信任本地文件中的 pickle 数据时,这才是一个好的解决方案。也许您应该发布有关字典值中的内容的详细信息。 (我假设键是单词,如果不是,请纠正我)
您可能还想看看不生成整个字典,而只按块处理它。这在您的特定用例中可能不切实际(不幸的是,它通常不适用于字典的用途),但如果您能想到一种方法,那么重新设计您的算法以允许它可能是值得的。
Yes. It's called a database. Since a dictionary was working for you (aside from memory concerns) I would suppose that an sqlite database would work fine for you. You can use the sqlite3 quite easily and it is very well documented.
Of course this will only be a good solution if you can represent the values as something like json or are willing to trust pickled data from a local file. Maybe you should post details about what you have in the values of the dictionary. (I'm assuming the keys are words, if not please correct me)
You might also want to look at not generating the whole dictionary and only processing it in chunks. This may not be practical in your particular use case (It often isn't with the sort of thing that dictionaries are used for unfortunately) but if you can think of a way, it may be worth it to redesign your algorithm to allow it.
我不确定你的话指的是什么,但我想如果内存是一个问题的话,它们是相当大的结构。
我确实通过从 Python 32 位切换到 Python 64 位解决了 Python MemoryError 问题。事实上,一些 Python 结构对于 4 GB 地址空间来说已经变得太大了。您可能想尝试一下,作为解决您问题的简单潜在解决方案。
I'm not sure what your words point to, but I guess they're quite big structures, if memory is an issue.
I did solve a Python MemoryError problem once by switching from Python 32 bits to Python 64 bits. In fact, some Python structures had become to large for the 4 GB address space. You might want to try that, as a simple potential solution to your problem.