写入二进制性能 numpy.ndarray.tofile vs numpy.ndarray.tobytes vs C++文件写入
我正在尝试将一些大型数组写入磁盘。我测试了3个选项; Python 中的 2:
import timeit
import numpy as np
# N=800 generates files about 4GB
N=800
compute_start=timeit.default_timer()
vals = np.sqrt((np.arange(N)**2)[:,None,None]+(np.arange(N)**2)[None,:,None]+(np.arange(N)**2)[None,None,:])
compute_end=timeit.default_timer()
print("Compute time: ",compute_end-compute_start)
tofile_start=timeit.default_timer()
for i in range(2):
f = open("out.bin", "wb")
vals.tofile(f)
f.close()
tofile_end=timeit.default_timer()
print("tofile time: ",tofile_end-tofile_start)
tobytes_start=timeit.default_timer()
for i in range(2):
f = open("out.bin", "wb")
f.write(vals.tobytes())
f.close()
tobytes_end=timeit.default_timer()
print("tobytes time: ",tobytes_end-tobytes_start)
对于 C++(使用 g++ -O3
编译),
#include<chrono>
#include<fstream>
#include<vector>
int main(){
std::vector<double> q(800*800*800, 3.14);
auto dump_start = std::chrono::steady_clock::now();
for (int i=0; i<2; i++) {
std::ofstream outfile("out.bin",std::ios::out | std::ios::binary);
outfile.write(reinterpret_cast<const char*>(&q[0]), q.size()*sizeof(double));
outfile.close();
}
auto dump_end = std::chrono::steady_clock::now();
std::printf("Dump time: %12.3f\n",(std::chrono::duration_cast<std::chrono::microseconds>(dump_end - dump_start).count())/1000000.0);
return 0;
}
tofile
的报告时间为 16 秒,tobyte
为 39 秒,tobyte
为 34 秒。关于为什么它们应该如此不同的任何想法吗?特别是文档说 numpy.ndarray.tofile()
相当于file.write(numpy.ndarray.tobytes()).
谢谢~
I'm trying to write to disk some large arrays. I've tested 3 options;
2 in Python:
import timeit
import numpy as np
# N=800 generates files about 4GB
N=800
compute_start=timeit.default_timer()
vals = np.sqrt((np.arange(N)**2)[:,None,None]+(np.arange(N)**2)[None,:,None]+(np.arange(N)**2)[None,None,:])
compute_end=timeit.default_timer()
print("Compute time: ",compute_end-compute_start)
tofile_start=timeit.default_timer()
for i in range(2):
f = open("out.bin", "wb")
vals.tofile(f)
f.close()
tofile_end=timeit.default_timer()
print("tofile time: ",tofile_end-tofile_start)
tobytes_start=timeit.default_timer()
for i in range(2):
f = open("out.bin", "wb")
f.write(vals.tobytes())
f.close()
tobytes_end=timeit.default_timer()
print("tobytes time: ",tobytes_end-tobytes_start)
And for C++ (compiled with g++ -O3
#include<chrono>
#include<fstream>
#include<vector>
int main(){
std::vector<double> q(800*800*800, 3.14);
auto dump_start = std::chrono::steady_clock::now();
for (int i=0; i<2; i++) {
std::ofstream outfile("out.bin",std::ios::out | std::ios::binary);
outfile.write(reinterpret_cast<const char*>(&q[0]), q.size()*sizeof(double));
outfile.close();
}
auto dump_end = std::chrono::steady_clock::now();
std::printf("Dump time: %12.3f\n",(std::chrono::duration_cast<std::chrono::microseconds>(dump_end - dump_start).count())/1000000.0);
return 0;
}
Times reported are 16 seconds for tofile
, 39 seconds for tobyte
and 34 for write
. Any ideas on why they should be so different? Especially the two Numpy cases; the docs say that numpy.ndarray.tofile()
is equivalent to file.write(numpy.ndarray.tobytes()).
Thank you~
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
最近,在将大型数据集(16 GB)写入原始二进制文件时,我一直被 numpy.ndarray.tofile() 的速度所困扰,这对我的情况有帮助(运行在Windows 10),尽管我不明白为什么:
numpy.ndarray.flatten().tofile()
,假设您在写入二进制文件时不被结构所困扰,因此,使用您的变量名称,在我的例子中使用此代码:
写入速度从大约 60 MB/s 增加到几乎 200 MB/s(仍然低于固态硬盘)。然而,写入速度并不是恒定的,有时会下降。我希望它可能仍然有帮助。
I've been bothered by the speed of
numpy.ndarray.tofile()
when writing large data sets (16 GB) to raw binary files as well lately, and here's what helped in my case (running on Windows 10), though I don't understand why:numpy.ndarray.flatten().tofile()
, assuming you're not bothered by the structure when writing binary filesSo with your variable names, by using this this code in my case:
the writing speed increased from around 60 MB/s to almost 200 MB/s (which is still below the limit of the SSD). However, the writing speed isn't constant and will sometimes drop. It might still be helpful I hope.