速度权衡:频繁读取文件与使用动态内存存储文件
我正在编写一个 C 程序,其中涉及读取图像文件并仅读取图像的每个像素一次。那么我应该使用 fread() 读取一次文件并将其存储在某个动态变量(堆变量)中还是经常对每个像素使用 fread() ? 图像尺寸为 1000*1000 到 5000*5000。 我将在 MPI 和 CUDA 中扩展相同的程序。如果有任何其他建议,我将不胜感激。 谢谢。
I am writing a C program which involves reading a image file and reading each pixel of image just once. So should i read file once using fread() and store it in some dynamic variable(heap variable) or frequeently use fread() for each pixel??
Image will be of size 1000*1000 to 5000*5000.
I will be extending the same program in MPI and CUDA. I would appreciate any other suggestions.
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
即使是 12 位彩色 ARGB 图像也需要大约 150 MB 才能实现 5,000 * 5,000 像素分辨率,这完全在当前所有 PC 甚至许多 GPU 卡的能力范围之内。如果您有此类可用内存,则应该在动态分配的数组或类似的内容中读取一次。它将允许您以大 I/O 块读取整个图像,速度更快,并使用直接内存操作 (
img[1234][4321][RED] = 34
),而不是复杂化具有 I/O 函数的代码。如果您没有此类可用内存,请查看
mmap()
或操作系统存在的任何等效函数,将文件映射到虚拟内存中。您仍然具有使用直接内存操作的优势,而不必将整个内容加载到内存中,尽管这在计算上会更加昂贵。也就是说,现代操作系统执行大量数据缓存和预取,因此使用
fread()
可能不会慢很多。此外,在当前具有 glibc-2.3 或更高版本的 Linux 系统上,即使应用程序使用标准stdio
执行 I/O,也可以选择使用mmap()
进行文件访问> 功能。Even a 12-bit colour ARGB image would need about 150 MB for a 5,000 * 5,000 pixel resolution, which is well within the capabilities of all current PCs and even many GPU cards. If you have that kind of memory available, you should read it once in a dynamically allocated array, or something along those lines. It would allow you to read the whole image in big I/O blocks, which is faster, and use direct memory operations (
img[1234][4321][RED] = 34
), rather than complicate your code with I/O functions.If you do not have that kind of memory available, look at
mmap()
or whatever equivalent exists for your OS to map the file into virtual memory. You still have the advantage of using direct memory operations, without necessarily loading the whole thing in memory, although it would be computationally more expensive.That said, modern OS perform extensive caching and prefetching of data, therefore using
fread()
may not be that much slower. Moreover, on current Linux systems with glibc-2.3 or later, it is optionally possible to usemmap()
for file access, even when the application performs I/O with standardstdio
functions.这取决于。您应该尝试估计大多数运行您的软件的计算机上的内存量。它还取决于您的代码对速度的关键程度。
显然,一种方法速度更快,而另一种方法则使用更多内存。一般来说,您可能可以在大多数现代计算机上将其加载到内存中,而且这更容易。但你必须根据你的具体情况权衡利弊。
It depends. You should try and estimate the amount of memory on most computers that will run your software. It also depends on how speed critical your code is.
Obviously, one approach is faster while the other uses much more memory. In general, you are probably okay loading it into memory on most modern computers and that's easier. But you have to weigh the pros and cons in your particular case.
一般来说,我发现处理文件的最快方法是尝试在一次大 I/O 中将整个文件读入内存,然后从内存中处理它。它通常也使代码更简单。
当然,您必须担心可能不适合任何可用的连续内存块的文件。如果你处理得当(而不仅仅是保释),代码就会变得更加复杂。作为一名经过认证的懒惰程序员,如果我能逃脱惩罚,我宁愿放弃。 :-)
Generally I've found the quickest way to deal with files is to try to read the whole thing into memory in one big I/O, and deal with it out of memory from then on in. It often makes the code simpler too.
You do of course have to worry about files that might not fit in any available contiguous memory chunk. If you handle that properly (rather than just bail) the code becomes much more complex. As a certified lazy programmer, I prefer to just bail if I can get away with it. :-)
这是另一个可以帮助您做出决定的问题: fopen(), fclose( ) 工作吗?
如果您追求速度,最好将整个文件一次加载到内存中并在那里对其进行操作。这样您就可以避免不必要地调用硬盘驱动器来提供数据。当您开始谈论为 5k 图像提供 25,000,000 个不同的 4 字节块(假设为 32 位 RGBA)时,您可能会面临大量的查找、读取和等待。
这是经典的内存与速度权衡之一。如果您的客户有足够的内存,那么最好将所有数据加载到内存中,然后执行转换。
否则,请尝试一次加载足够的数据(分页),以便其快速并适合您的目标内存配置文件。
Here's another question that may help you make a decision: How exactly does fopen(), fclose() work?
If you're looking for speed, it would best to load the entire file at once in to memory and manipulate it there. That way you're avoiding unecesary calls to your hard disk driver to provide the data. When you start talking about providing 25,000,000 different 4-byte chunks (assuming 32-bit RGBA) for a 5k image, you're looking at potentially a lot of seeking, reading, and waiting.
This is one of the classic memory vs speed tradeoff's. If your customers will have enough memory, then it would be best to load all the data in to memory then perform your transformations.
Otherwise try to load enough data at a time (paging) so that its fast and fits the memory profile you're targetting.
取决于您需要处理哪种算法。
5000 * 5000 的图像约为 95 Mb。没什么大不了的。
在Gpu方面,您可以以大约4MB-16MB的块异步上传到GPU内存以使带宽饱和
您必须在cuda上使用固定内存,我认为如果您内存映射文件复制块
会更快。
像往常一样分析您的应用程序以获得最佳调整。
Depends on which kind of algorithm you need to process.
An image of 5000 * 5000 is around 95 Mb. Not big deal.
On the Gpu side you can async upload to the GPU memory in block of around 4MB-16MB to saturate the bandwidth
You have to use Pinned Memory on cuda, and I think if you memory map the file copy the blocks
will be even faster.
As usual profile your application for the best tuning.
看看linux下使用mmap()或者windows下使用mapviewoffile()。
Look at using mmap() linux or mapviewoffile() under windows.
将其存储在内存中肯定会更快。如果您每次都从硬盘驱动器读取小块,则总是会因最短访问时间等原因而产生延迟。
Storing it in memory will definately be faster. If you read small chuncks from a hard drive every time, you always incur delays due to minimum access times, etc.
我本来想把它写成评论,但它太长了。但说到重点……
我同意 TED 和 Jonathan Wood 的观点:
-TED
- Jonathan Wood
请记住,具有 32 位颜色的 5000*5000 像素大约占用 100 MB 内存(+ 可能还有一些开销,以及您的软件其他需要的任何内容)。我想说(最好猜测 Stetson-Harrison 值)大多数现代台式电脑至少有 1 或 2 GB 内存(我的电脑是 2008 年购买的,有 4 GB),所以即使加载了整个东西,内存也没有那么多笔记本电脑的内存可能会立即减少。
CUDA方面也很有趣(我对CUDA几乎一无所知),数据是否加载到GPU的内存中?支持 CUDA 的 GPU 通常有多少内存? PCI-e 总线会成为瓶颈吗(可能不会……?)?了解支持 CUDA 的启用 CUDA 的台式机和笔记本电脑 GPU 的常见内存量有。
一种折衷方案可能是尝试缓冲读取,让另一个线程“提前读取”文件中的数据,而其他线程则处理(并释放内存)数据。
I was going to write this up as a comment, but it became too long. But on to the point...
I agree with T.E.D. and Jonathan Wood:
-T.E.D
-Jonathan Wood
Keep in mind that 5000*5000 pixels with 32bit colors takes up roughly 100 megabytes of memory (+ maybe some overhead, and whatever your software otherwise needs). I'd say (best guess Stetson-Harrison-value) most modern desktop computers have at least 1 or 2 gigabytes of memory (mine was bought in 2008 and has 4), so it's not that much really even if the whole thing is loaded at once, laptops might have less memory.
The CUDA aspect is also interesting (I know next to nothing about CUDA), is the data loaded into the GPU's memory? How much memory CUDA-enabled GPUs usually have? Could the PCI-e bus become a bottleneck (probably not..?)? Find out how much memory common CUDA-enabled desktop- and laptop-GPUs with CUDA-support have.
A sort of a compromise might be trying to buffer the reading, have another thread "read-ahead" the data from the file, while other(s) process (and free memory as they go) the data.