如何在文件中存储和检索 Python 本机数据结构?

发布于 2024-09-16 16:05:29 字数 1251 浏览 8 评论 0原文

我正在读取一个 XML 文件并将所需的数据重新组织为 Python 数据结构(列表、元组等)。

例如,我的 XML 解析器模块之一生成以下数据:

# data_miner.py
animals = ['Chicken', 'Sheep', 'Cattle', 'Horse']
population = [150, 200, 50, 30]

然后我有一个绘图仪模块,大致如下:

# plotter.py
from data_miner import animals, population

plot(animals, population)

使用使用这种方法,每次绘图时我都必须解析 XML 文件。我仍在测试程序的其他方面,并且 XML 文件目前没有那么频繁地更改。避免解析阶段将极大地缩短我的测试时间。

这是我想要的结果:
data_miner.pyplotter.py 之间,我想要一个包含 animalspopulation 的文件,以便它们可以通过plotter.py本地访问(例如,不改变绘图代码),而不必每次都运行data_miner.py。如果可能,它不应采用 csv 或任何 ASCII 格式,而应采用本机可访问的格式。 plotter.py 现在应该大致如下所示:

# plotter.py

# This line may not necessarily be a one-liner.
from data_file import animals, population

# But I want this portion to stay the same
plot(animals, population)

类比:
这大致相当于 MATLAB 的 save 命令,该命令将活动工作区的变量保存到 .mat 文件中。我正在寻找类似于 Python 的 .mat 文件的内容。

最近经历:
我见过 picklecpickle,但我不知道如何让它工作。如果这是正确的工具,示例代码将非常有帮助。可能还有其他我还不知道的工具。

I am reading an XML file and reorganizing the desired data into Python data structures (lists, tuples, etc.)

For example, one of my XML parser modules produces the following data:

# data_miner.py
animals = ['Chicken', 'Sheep', 'Cattle', 'Horse']
population = [150, 200, 50, 30]

Then I have a plotter module that roughly says, e.g.:

# plotter.py
from data_miner import animals, population

plot(animals, population)

Using this method, I have to parse the XML file every time I do a plot. I'm still testing other aspects of my program and the XML file doesn't change as frequently for now. Avoiding the parse stage would dramatically improve my testing time.

This is my desired result:
In between data_miner.py and plotter.py, I want a file that contains animals and population such that they can be accessed by plotter.py natively (e.g. no change in plotting code), without having to run data_miner.py every time. If possible, it shouldn't be in csv or any ASCII format, just a natively-accessible format. plotter.py should now look roughly like:

# plotter.py

# This line may not necessarily be a one-liner.
from data_file import animals, population

# But I want this portion to stay the same
plot(animals, population)

Analogy:
This is roughly equivalent to MATLAB's save command that saves the active workspace's variables into a .mat file. I'm looking for something similar to the .mat file for Python.

Recent experience:
I have seen pickle and cpickle, but I'm not sure how to get it to work. If that is the right tool to use, example code would be very helpful. There may also be other tools that I don't know yet.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

莳間冲淡了誓言ζ 2024-09-23 16:05:29

pickle 模块或其更快的等效 cPickle 应该可以很好地满足您的需求。

具体来说:

# data_miner.py
import pickle

animals = ['Chicken', 'Sheep', 'Cattle', 'Horse']
population = [150, 200, 50, 30]

with open('data_miner.pik', 'wb') as f:
  pickle.dump([animals, population], f, -1)

# plotter.py
import pickle

with open('data_miner.pik', 'rb') as f:
    animals, population = pickle.load(f)

print animals, population

这里,我已经非常明确地说明了需要保存的内容(非常明确总是一个好主意,除非您有非常具体的理由否则)。有些东西(例如模块和打开的文件)无论如何都无法进行 pickle,因此简单的 globals() pickle 是行不通的。

如果绝对必要,您可以复制一个globals(),同时删除所有类型不适合保存的对象;或者,也许更好的是,在您不想想要保存的每个名称中虔诚地使用前导_(因此将pickle导入为_picklewith open ... as _f,等等)并从 globals() 的副本中排除所有带有前导下划线的名称 == 使用这种方法,pickle.load 将检索一个 dict,然后通过索引从中提取感兴趣的变量。但是,我强烈建议使用简单的替代方法保存 list(或 dict,如果需要的话;-),其中包含特定值实际上是感兴趣的,而不是采取“批发”的方式。

The pickle module, or its faster equivalent cPickle, should serve your needs well.

Specifically:

# data_miner.py
import pickle

animals = ['Chicken', 'Sheep', 'Cattle', 'Horse']
population = [150, 200, 50, 30]

with open('data_miner.pik', 'wb') as f:
  pickle.dump([animals, population], f, -1)

and

# plotter.py
import pickle

with open('data_miner.pik', 'rb') as f:
    animals, population = pickle.load(f)

print animals, population

Here, I've made data_miner.py quite explicit regarding what needs to be saved (always an excellent idea to be very explicit unless you have extremely specific reasons to do otherwise). Some things (such as modules and open files) cannot be pickled anyway, so a simple pickling of globals() would not work.

If you absolutely must, you could make a copy of globals() while removing all objects whose types make them unsuitable for saving; or, perhaps better, religiously use a leading _ in every name you don't want to save (so import pickle as _pickle, with open ... as _f, and so forth) and exclude from the copy of globals() all names with a leading underscore == with such an approach, the pickle.load would retrieve a dict, then the variables of interest would be extracted from it by indexing. However, I would strongly recommend the simple alternative of saving a list (or dict, if you want;-) with the specific values that are actually of interest, rather than taking a "wholesale" approach.

标点 2024-09-23 16:05:29

如果您有特定于 Python 的对象要保存,Pickling 是很好的选择。如果它们只是某些基本容器类型中的通用数据,那么 JSON 就可以了。

>>> json.dumps(['Chicken', 'Sheep', 'Cattle', 'Horse'])
'["Chicken", "Sheep", "Cattle", "Horse"]'
>>> json.dump(['Chicken', 'Sheep', 'Cattle', 'Horse'], sys.stdout) ; print
["Chicken", "Sheep", "Cattle", "Horse"]
>>> json.loads('["Chicken", "Sheep", "Cattle", "Horse"]')
[u'Chicken', u'Sheep', u'Cattle', u'Horse']

Pickling is good if you have Python-specific objects to save. If they're just generic data in some basic container type then JSON is fine.

>>> json.dumps(['Chicken', 'Sheep', 'Cattle', 'Horse'])
'["Chicken", "Sheep", "Cattle", "Horse"]'
>>> json.dump(['Chicken', 'Sheep', 'Cattle', 'Horse'], sys.stdout) ; print
["Chicken", "Sheep", "Cattle", "Horse"]
>>> json.loads('["Chicken", "Sheep", "Cattle", "Horse"]')
[u'Chicken', u'Sheep', u'Cattle', u'Horse']
生活了然无味 2024-09-23 16:05:29

pickle 就是为此而设计的。使用 pickle.dump 将对象写入文件,并使用 pickle.load 将其读回。

>>> data
{'animals': ['Chicken', 'Sheep', 'Cattle', 'Horse'], 'population': [150, 200, 50, 30]}
>>> f = open('spam.p', 'wb')
>>> pickle.dump(data, f)
>>> f.close()
>>> f = open('spam.p', 'rb')
>>> pickle.load(f)
{'animals': ['Chicken', 'Sheep', 'Cattle', 'Horse'], 'population': [150, 200, 50, 30]}

pickle was designed for this. Use pickle.dump to write an object to a file and pickle.load to read it back.

>>> data
{'animals': ['Chicken', 'Sheep', 'Cattle', 'Horse'], 'population': [150, 200, 50, 30]}
>>> f = open('spam.p', 'wb')
>>> pickle.dump(data, f)
>>> f.close()
>>> f = open('spam.p', 'rb')
>>> pickle.load(f)
{'animals': ['Chicken', 'Sheep', 'Cattle', 'Horse'], 'population': [150, 200, 50, 30]}
帅哥哥的热头脑 2024-09-23 16:05:29

正如已经建议的,这里通常使用pickle。请记住,并非所有内容都是可序列化的(即文件、套接字、数据库连接)。

对于简单的数据结构,您还可以选择 json 或 yaml。后者实际上非常可读和可编辑。

As already suggested, pickle is usually used here. Keep in mind that not everything is serializable (i.e. files, sockets, database connections).

With simple data structures you can also chose json or yaml. The latter is actually pretty readable and editable.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文