在 python 中存储和使用大型文本文件的最佳方法
我正在为我用 python 编写的令人惊叹的克隆创建一个网络服务器,它接受用户、解决棋盘问题并对玩家输入进行评分。 我使用的字典文件为 1.8MB(ENABLE2K 字典),我需要它可供多个游戏解算器类使用。 现在,我拥有它,以便每个类逐行迭代文件并生成哈希表(关联数组),但是我实例化的解算器类越多,它占用的内存就越多。
我想做的是导入字典文件一次,然后将其传递给每个需要的解算器实例。 但最好的方法是什么? 我应该在全局空间中导入字典,然后在求解器类中将其作为 globals()['dictionary'] 访问吗? 或者我应该导入字典然后将其作为参数传递给类构造函数? 其中一个比另一个更好吗? 还有第三种选择吗?
I'm creating a networked server for a boggle-clone I wrote in python, which accepts users, solves the boards, and scores the player input. The dictionary file I'm using is 1.8MB (the ENABLE2K dictionary), and I need it to be available to several game solver classes. Right now, I have it so that each class iterates through the file line-by-line and generates a hash table(associative array), but the more solver classes I instantiate, the more memory it takes up.
What I would like to do is import the dictionary file once and pass it to each solver instance as they need it. But what is the best way to do this? Should I import the dictionary in the global space, then access it in the solver class as globals()['dictionary']? Or should I import the dictionary then pass it as an argument to the class constructor? Is one of these better than the other? Is there a third option?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您创建一个dictionary.py模块,其中包含读取文件并构建字典的代码,则该代码只会在第一次导入时执行。 进一步的导入将返回对现有模块实例的引用。 因此,您的类可以:
其中dictionary.py 具有:
If you create a dictionary.py module, containing code which reads the file and builds a dictionary, this code will only be executed the first time it is imported. Further imports will return a reference to the existing module instance. As such, your classes can:
where dictionary.py has:
尽管此时它本质上是单例,但反对全局变量的常见论点仍然适用。 对于 pythonic 单例替代品,请查找“borg”对象。
这确实是唯一的区别。 创建字典对象后,除非显式执行深层复制,否则您仅在传递它时绑定新引用。 只要每个求解器实例不需要私有副本进行修改,它就集中构建一次且仅一次是有意义的。
Even though it is essentially a singleton at this point, the usual arguments against globals apply. For a pythonic singleton-substitute, look up the "borg" object.
That's really the only difference. Once the dictionary object is created, you are only binding new references as you pass it along unless if you explicitly perform a deep copy. It makes sense that it is centrally constructed once and only once so long as each solver instance does not require a private copy for modification.
Adam,请记住,在 Python 中,当您说:
...您实际上并没有复制
a
,因此使用更多内存时,您只是在制作b 对同一对象的另一个引用。
因此基本上您提出的任何解决方案在内存使用方面都会好得多。 基本上,读字典一次,然后抓住对它的引用。 无论您使用全局变量还是将其传递给每个实例或其他东西,您都将引用同一个对象而不是重复它。
哪一个最Pythonic? 这完全是另一种蠕虫,但这是我个人会做的:
HTH。
Adam, remember that in Python when you say:
... you are not actually copying
a
, and thus using more memory, you are merely makingb
another reference to the same object.So basically any of the solutions you propose will be far better in terms of memory usage. Basically, read in the dictionary once and then hang on to a reference to that. Whether you do it with a global variable, or pass it to each instance, or something else, you'll be referencing the same object and not duplicating it.
Which one is most Pythonic? That's a whole 'nother can of worms, but here's what I would do personally:
HTH.
根据您的字典包含的内容,您可能对“shelve”或“anydbm”模块感兴趣。 它们为您提供类似字典的接口(只是字符串作为“anydbm”的键和项目,以及字符串作为键和任何 python 对象作为“shelve”的项目),但数据实际上位于 DBM 文件中(gdbm、ndbm、dbhash、 bsddb,取决于平台上可用的内容。)您可能仍然希望按照您的要求在类之间共享实际的数据库,但这将避免解析文本文件步骤以及保留所有内容- 内存位。
Depending on what your dict contains, you may be interested in the 'shelve' or 'anydbm' modules. They give you dict-like interfaces (just strings as keys and items for 'anydbm', and strings as keys and any python object as item for 'shelve') but the data is actually in a DBM file (gdbm, ndbm, dbhash, bsddb, depending on what's available on the platform.) You probably still want to share the actual database between classes as you are asking for, but it would avoid the parsing-the-textfile step as well as the keeping-it-all-in-memory bit.