如何在 numpy 中转储布尔矩阵?

发布于 2024-10-09 01:10:21 字数 601 浏览 4 评论 0原文

我有一个表示为 numpy 布尔数组的图表 (G.adj.dtype == bool)。这是编写我自己的图库的作业,所以我不能使用networkx。我想将它转储到一个文件中,以便我可以摆弄它,但我一生都无法弄清楚如何使 numpy 以可恢复的方式转储它。

我尝试过G.adj.tofile,它将图表正确地(ish)写为一长行True/False。但是,fromfile 在读取此内容时遇到了困难,给出了一个 1x1 数组,并且 loadtxt 引发了 ValueError:int 的无效文字np.savetxt 可以工作,但将矩阵保存为 0/1 浮点列表,并且 loadtxt(..., dtype=bool) 失败并出现相同的 ValueError。

最后,我尝试了 networkx.from_numpy_matrixnetworkx.write_dot,但这给出了点源中的每条边 [weight=True],这破坏了networkx.read_dot

I have a graph represented as a numpy boolean array (G.adj.dtype == bool). This is homework in writing my own graph library, so I can't use networkx. I want to dump it to a file so that I can fiddle with it, but for the life of me I can't work out how to make numpy dump it in a recoverable fashion.

I've tried G.adj.tofile, which wrote the graph correctly (ish) as one long line of True/False. But fromfile barfs on reading this, giving a 1x1 array, and loadtxt raises a ValueError: invalid literal for int. np.savetxt works but saves the matrix as a list of 0/1 floats, and loadtxt(..., dtype=bool) fails with the same ValueError.

Finally, I've tried networkx.from_numpy_matrix with networkx.write_dot, but that gave each edge [weight=True] in the dot source, which broke networkx.read_dot.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

月棠 2024-10-16 01:10:21

保存:

numpy.savetxt('arr.txt', G.adj, fmt='%s')

恢复:

G.adj = numpy.genfromtxt('arr.txt', dtype=bool)

HTH!

To save:

numpy.savetxt('arr.txt', G.adj, fmt='%s')

To recover:

G.adj = numpy.genfromtxt('arr.txt', dtype=bool)

HTH!

太阳公公是暖光 2024-10-16 01:10:21

这是我的测试用例:

m = numpy.random(100,100) > 0.5

空间效率

numpy.savetxt('arr.txt', obj, fmt='%s') 创建一个 54 kB 文件。

numpy.savetxt('arr.txt', obj, fmt='%d') 创建一个小得多的文件 (20 kB)。

cPickle.dump(obj, open('arr.dump', 'w')),创建一个40kB的文件,

时间效率

numpy.savetxt('arr.txt', obj, fmt='%s') 45 毫秒

numpy.savetxt('arr.txt', obj, fmt='%d') 10 毫秒

cPickle.dump(obj, open('arr.dump', 'w')), 2.3 ms

结论

如果需要人类可读性,请使用文本格式 (%s) 的 savetxt,如果考虑空间问题,请使用数字格式 (%d) 的 savetxt;如果考虑时间问题,请使用 cPickle

This is my test case:

m = numpy.random(100,100) > 0.5

space efficiency

numpy.savetxt('arr.txt', obj, fmt='%s') creates a 54 kB file.

numpy.savetxt('arr.txt', obj, fmt='%d') creates a much smaller file (20 kB).

cPickle.dump(obj, open('arr.dump', 'w')), which creates a 40kB file,

time efficiency

numpy.savetxt('arr.txt', obj, fmt='%s') 45 ms

numpy.savetxt('arr.txt', obj, fmt='%d') 10 ms

cPickle.dump(obj, open('arr.dump', 'w')), 2.3 ms

conclusion

use savetxt with text format (%s) if human readability is needed, use savetxt with numeric format (%d) if space consideration are an issue and use cPickle if time is an issue.

笑咖 2024-10-16 01:10:21

保存数组(包括元数据(dtype、维度))的最简单方法是使用 numpy.save()numpy.load()

a = array([[False,  True, False],
           [ True, False,  True],
           [False,  True, False],
           [ True, False,  True],
           [False,  True, False]], dtype=bool)
numpy.save("data.npy", a)
numpy.load("data.npy")
# array([[False,  True, False],
#        [ True, False,  True],
#        [False,  True, False],
#        [ True, False,  True],
#        [False,  True, False]], dtype=bool)

a.tofile()numpy.fromfile() 也可以工作,但不保存任何元数据。您需要将 dtype=bool 传递给 fromfile() 并将获得一个一维数组,该数组必须通过 reshape()d 恢复为原始数组形状。

The easiest way to save your array including metadata (dtype, dimensions) is to use numpy.save() and numpy.load():

a = array([[False,  True, False],
           [ True, False,  True],
           [False,  True, False],
           [ True, False,  True],
           [False,  True, False]], dtype=bool)
numpy.save("data.npy", a)
numpy.load("data.npy")
# array([[False,  True, False],
#        [ True, False,  True],
#        [False,  True, False],
#        [ True, False,  True],
#        [False,  True, False]], dtype=bool)

a.tofile() and numpy.fromfile() would work as well, but don't save any metadata. You need to pass dtype=bool to fromfile() and will get a one-dimensional array that must be reshape()d to its original shape.

诗化ㄋ丶相逢 2024-10-16 01:10:21

我知道这个问题已经很老了,但我想添加 Python 3 基准测试。这与上一篇有点不同。

首先,我将大量数据加载到内存中,将其转换为 int8 numpy 数组,仅使用 01 作为可能值,然后将其转储到HDD 使用两种方法。

from timer import Timer
import numpy
import pickle

# Load data part of code is omitted.

prime = int(sys.argv[1])

np_table = numpy.array(check_table, dtype=numpy.int8)
filename = "%d.dump" % prime

with Timer() as t:
  pickle.dump(np_table, open("dumps/pickle_" + filename, 'wb'))

print('pickle took %.03f sec.' % (t.interval))

with Timer() as t:
  numpy.savetxt("dumps/np_" + filename, np_table, fmt='%d')

print('savetxt took %.03f sec.' % (t.interval))

时间测量

It took 50.700 sec to load data number 11
pickle took 0.010 sec.
savetxt took 1.930 sec.

It took 1297.970 sec to load data number 29
pickle took 0.070 sec.
savetxt took 242.590 sec.

It took 1583.380 sec to load data number 31
pickle took 0.090 sec.
savetxt took 334.740 sec.

It took 3855.840 sec to load data number 41
pickle took 0.610 sec.
savetxt took 1367.840 sec.

It took 4457.170 sec to load data number 43
pickle took 0.780 sec.
savetxt took 1654.050 sec.

It took 5792.480 sec to load data number 47
pickle took 1.160 sec.
savetxt took 2393.680 sec.

It took 8101.020 sec to load data number 53
pickle took 1.980 sec.
savetxt took 4397.080 sec.

大小测量

630K np_11.dump
 79M np_29.dump
110M np_31.dump
442M np_41.dump
561M np_43.dump
875M np_47.dump
1,6G np_53.dump

315K pickle_11.dump
 40M pickle_29.dump
 55M pickle_31.dump
221M pickle_41.dump
281M pickle_43.dump
438M pickle_47.dump
798M pickle_53.dump

因此,Python 3 pickle 版本比 numpy.savetxt 快得多,并且使用的 HDD 体积减少了约 2 倍。

I know that question is quite old, but I want to add Python 3 benchmarks. It is a bit different than previous one.

Firstly I load a lot of data to memory, convert it to int8 numpy array with only 0 and 1 as possible values and then dump it to HDD using two approaches.

from timer import Timer
import numpy
import pickle

# Load data part of code is omitted.

prime = int(sys.argv[1])

np_table = numpy.array(check_table, dtype=numpy.int8)
filename = "%d.dump" % prime

with Timer() as t:
  pickle.dump(np_table, open("dumps/pickle_" + filename, 'wb'))

print('pickle took %.03f sec.' % (t.interval))

with Timer() as t:
  numpy.savetxt("dumps/np_" + filename, np_table, fmt='%d')

print('savetxt took %.03f sec.' % (t.interval))

Time measuring

It took 50.700 sec to load data number 11
pickle took 0.010 sec.
savetxt took 1.930 sec.

It took 1297.970 sec to load data number 29
pickle took 0.070 sec.
savetxt took 242.590 sec.

It took 1583.380 sec to load data number 31
pickle took 0.090 sec.
savetxt took 334.740 sec.

It took 3855.840 sec to load data number 41
pickle took 0.610 sec.
savetxt took 1367.840 sec.

It took 4457.170 sec to load data number 43
pickle took 0.780 sec.
savetxt took 1654.050 sec.

It took 5792.480 sec to load data number 47
pickle took 1.160 sec.
savetxt took 2393.680 sec.

It took 8101.020 sec to load data number 53
pickle took 1.980 sec.
savetxt took 4397.080 sec.

Size measuring

630K np_11.dump
 79M np_29.dump
110M np_31.dump
442M np_41.dump
561M np_43.dump
875M np_47.dump
1,6G np_53.dump

315K pickle_11.dump
 40M pickle_29.dump
 55M pickle_31.dump
221M pickle_41.dump
281M pickle_43.dump
438M pickle_47.dump
798M pickle_53.dump

So Python 3 pickle version is much faster than numpy.savetxt and is using about 2 times less HDD volume.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文