如何在 numpy 中转储布尔矩阵?
我有一个表示为 numpy 布尔数组的图表 (G.adj.dtype == bool
)。这是编写我自己的图库的作业,所以我不能使用networkx。我想将它转储到一个文件中,以便我可以摆弄它,但我一生都无法弄清楚如何使 numpy 以可恢复的方式转储它。
我尝试过G.adj.tofile
,它将图表正确地(ish)写为一长行True/False。但是,fromfile
在读取此内容时遇到了困难,给出了一个 1x1 数组,并且 loadtxt
引发了 ValueError:int 的无效文字
。 np.savetxt
可以工作,但将矩阵保存为 0/1 浮点列表,并且 loadtxt(..., dtype=bool
) 失败并出现相同的 ValueError。
最后,我尝试了 networkx.from_numpy_matrix
和 networkx.write_dot
,但这给出了点源中的每条边 [weight=True]
,这破坏了networkx.read_dot
。
I have a graph represented as a numpy boolean array (G.adj.dtype == bool
). This is homework in writing my own graph library, so I can't use networkx. I want to dump it to a file so that I can fiddle with it, but for the life of me I can't work out how to make numpy dump it in a recoverable fashion.
I've tried G.adj.tofile
, which wrote the graph correctly (ish) as one long line of True/False. But fromfile
barfs on reading this, giving a 1x1 array, and loadtxt
raises a ValueError: invalid literal for int
. np.savetxt
works but saves the matrix as a list of 0/1 floats, and loadtxt(..., dtype=bool
) fails with the same ValueError.
Finally, I've tried networkx.from_numpy_matrix
with networkx.write_dot
, but that gave each edge [weight=True]
in the dot source, which broke networkx.read_dot
.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
保存:
恢复:
HTH!
To save:
To recover:
HTH!
这是我的测试用例:
空间效率
numpy.savetxt('arr.txt', obj, fmt='%s')
创建一个 54 kB 文件。numpy.savetxt('arr.txt', obj, fmt='%d')
创建一个小得多的文件 (20 kB)。cPickle.dump(obj, open('arr.dump', 'w'))
,创建一个40kB的文件,时间效率
numpy.savetxt('arr.txt', obj, fmt='%s')
45 毫秒numpy.savetxt('arr.txt', obj, fmt='%d')
10 毫秒cPickle.dump(obj, open('arr.dump', 'w'))
, 2.3 ms结论
如果需要人类可读性,请使用文本格式 (
%s
) 的savetxt
,如果考虑空间问题,请使用数字格式 (%d
) 的savetxt
;如果考虑时间问题,请使用cPickle
。This is my test case:
space efficiency
numpy.savetxt('arr.txt', obj, fmt='%s')
creates a 54 kB file.numpy.savetxt('arr.txt', obj, fmt='%d')
creates a much smaller file (20 kB).cPickle.dump(obj, open('arr.dump', 'w'))
, which creates a 40kB file,time efficiency
numpy.savetxt('arr.txt', obj, fmt='%s')
45 msnumpy.savetxt('arr.txt', obj, fmt='%d')
10 mscPickle.dump(obj, open('arr.dump', 'w'))
, 2.3 msconclusion
use
savetxt
with text format (%s
) if human readability is needed, usesavetxt
with numeric format (%d
) if space consideration are an issue and usecPickle
if time is an issue.保存数组(包括元数据(dtype、维度))的最简单方法是使用
numpy.save()
和numpy.load()
:a.tofile()
和numpy.fromfile()
也可以工作,但不保存任何元数据。您需要将dtype=bool
传递给fromfile()
并将获得一个一维数组,该数组必须通过reshape()
d 恢复为原始数组形状。The easiest way to save your array including metadata (dtype, dimensions) is to use
numpy.save()
andnumpy.load()
:a.tofile()
andnumpy.fromfile()
would work as well, but don't save any metadata. You need to passdtype=bool
tofromfile()
and will get a one-dimensional array that must bereshape()
d to its original shape.我知道这个问题已经很老了,但我想添加 Python 3 基准测试。这与上一篇有点不同。
首先,我将大量数据加载到内存中,将其转换为
int8
numpy 数组,仅使用0
和1
作为可能值,然后将其转储到HDD 使用两种方法。时间测量
大小测量
因此,Python 3
pickle
版本比numpy.savetxt
快得多,并且使用的 HDD 体积减少了约 2 倍。I know that question is quite old, but I want to add Python 3 benchmarks. It is a bit different than previous one.
Firstly I load a lot of data to memory, convert it to
int8
numpy array with only0
and1
as possible values and then dump it to HDD using two approaches.Time measuring
Size measuring
So Python 3
pickle
version is much faster thannumpy.savetxt
and is using about 2 times less HDD volume.