在 python 中存储和使用大型文本文件的最佳方法

发布于 2024-07-07 06:08:42 字数 342 浏览 19 评论 0原文

我正在为我用 python 编写的令人惊叹的克隆创建一个网络服务器，它接受用户、解决棋盘问题并对玩家输入进行评分。我使用的字典文件为 1.8MB（ENABLE2K 字典），我需要它可供多个游戏解算器类使用。现在，我拥有它，以便每个类逐行迭代文件并生成哈希表（关联数组），但是我实例化的解算器类越多，它占用的内存就越多。

我想做的是导入字典文件一次，然后将其传递给每个需要的解算器实例。但最好的方法是什么？我应该在全局空间中导入字典，然后在求解器类中将其作为 globals()['dictionary'] 访问吗？或者我应该导入字典然后将其作为参数传递给类构造函数？其中一个比另一个更好吗？还有第三种选择吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

孤千羽 2024-07-14 06:08:42

如果您创建一个dictionary.py模块，其中包含读取文件并构建字典的代码，则该代码只会在第一次导入时执行。进一步的导入将返回对现有模块实例的引用。因此，您的类可以：

import dictionary

dictionary.words[whatever]

其中dictionary.py 具有：

words = {}

# read file and add to 'words'

If you create a dictionary.py module, containing code which reads the file and builds a dictionary, this code will only be executed the first time it is imported. Further imports will return a reference to the existing module instance. As such, your classes can:

import dictionary

dictionary.words[whatever]

where dictionary.py has:

words = {}

# read file and add to 'words'

回复收藏 0 原文

倒带 2024-07-14 06:08:42

尽管此时它本质上是单例，但反对全局变量的常见论点仍然适用。对于 pythonic 单例替代品，请查找“borg”对象。

这确实是唯一的区别。创建字典对象后，除非显式执行深层复制，否则您仅在传递它时绑定新引用。只要每个求解器实例不需要私有副本进行修改，它就集中构建一次且仅一次是有意义的。

回复收藏 0 原文

紫竹語嫣☆ 2024-07-14 06:08:42

Adam，请记住，在 Python 中，当您说：

a = read_dict_from_file()
b = a

...您实际上并没有复制 a，因此使用更多内存时，您只是在制作 b 对同一对象的另一个引用。

因此基本上您提出的任何解决方案在内存使用方面都会好得多。基本上，读字典一次，然后抓住对它的引用。无论您使用全局变量还是将其传递给每个实例或其他东西，您都将引用同一个对象而不是重复它。

哪一个最Pythonic？这完全是另一种蠕虫，但这是我个人会做的：

def main(args):
  run_initialization_stuff()
  dictionary = read_dictionary_from_file()
  solvers = [ Solver(class=x, dictionary=dictionary) for x in len(number_of_solvers) ]

HTH。

Adam, remember that in Python when you say:

a = read_dict_from_file()
b = a

... you are not actually copying a, and thus using more memory, you are merely making b another reference to the same object.

So basically any of the solutions you propose will be far better in terms of memory usage. Basically, read in the dictionary once and then hang on to a reference to that. Whether you do it with a global variable, or pass it to each instance, or something else, you'll be referencing the same object and not duplicating it.

Which one is most Pythonic? That's a whole 'nother can of worms, but here's what I would do personally:

def main(args):
  run_initialization_stuff()
  dictionary = read_dictionary_from_file()
  solvers = [ Solver(class=x, dictionary=dictionary) for x in len(number_of_solvers) ]

HTH.

回复收藏 0 原文

星 2024-07-14 06:08:42

根据您的字典包含的内容，您可能对“shelve”或“anydbm”模块感兴趣。它们为您提供类似字典的接口（只是字符串作为“anydbm”的键和项目，以及字符串作为键和任何 python 对象作为“shelve”的项目），但数据实际上位于 DBM 文件中（gdbm、ndbm、dbhash、 bsddb，取决于平台上可用的内容。）您可能仍然希望按照您的要求在类之间共享实际的数据库，但这将避免解析文本文件步骤以及保留所有内容- 内存位。

回复收藏 0 原文

~没有更多了~