整数和字符哪个内存更快?文件映射或块读取?
好的,我之前已经编写了一个(相当未优化的)程序来将图像编码为 JPEG,但是,现在我正在使用 MPEG-2 传输流以及其中的 H.264 编码视频。在我开始对所有这些进行编程之前,我很好奇处理实际文件的最快方法是什么。
目前,我正在将 .mts 文件文件映射到内存中以对其进行处理,尽管我不确定(例如)将 100 MB 的文件分块读入内存并以这种方式处理是否会更快。
这些文件需要大量的位移位等来读取标志,所以我想知道当我引用某些内存时,一次读取 4 个字节作为整数或 1 个字节作为字符是否更快。我想我在某处读到 x86 处理器已优化为 4 字节粒度,但我不确定这是否属实......
谢谢!
Okay, so I've written a (rather unoptimized) program before to encode images to JPEGs, however, now I am working with MPEG-2 transport streams and the H.264 encoded video within them. Before I dive into programming all of this, I am curious what the fastest way to deal with the actual file is.
Currently I am file-mapping the .mts file into memory to work on it, although I am not sure if it would be faster to (for example) read 100 MB of the file into memory in chunks and deal with it that way.
These files require a lot of bit-shifting and such to read flags, so I am wondering that when I reference some of the memory if it is faster to read 4 bytes at once as an integer or 1 byte as a character. I thought I read somewhere that x86 processors are optimized to a 4-byte granularity, but I'm not sure if this is true...
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果您需要文件同步可用,则内存映射文件通常是最快的可用操作。 (有些异步 API 允许操作系统重新排序,有时会稍微提高速度,但这听起来对您的应用程序没有帮助)
使用映射文件获得的主要优势是您可以在当操作系统仍在从磁盘读取文件时,它会保留在文件的内存中,并且您不必管理自己的锁定/线程文件读取代码。
内存引用方面,在 x86 内存上,无论您实际使用什么,都会一次读取整行。与非字节粒度操作相关的额外时间是指整数不需要字节对齐的事实。例如,如果内容未在 4 字节边界上对齐,则执行 ADD 将花费更多时间,但对于内存复制之类的内容来说,差别不大。如果您正在使用固有的字符数据,那么保持这种方式会比将所有内容读取为整数并进行位移位要快。
如果您正在进行 h.264 或 MPEG2 编码,无论如何,瓶颈可能是 CPU 时间而不是磁盘 I/O。
Memory mapped files are usually the fastest operations available if you require your file to be available synchronously. (There are some asynchronous APIs that allow the O/S to reorder things for a slight speed increase sometimes, but that sounds like it's not helpful in your application)
The main advantage you're getting with the mapped files is that you can work in memory on the file while it is still being read from disk by the O/S, and you don't have to manage your own locking/threaded file reading code.
Memory reference wise, on the x86 memory is going to be read an entire line at a time no matter what you're actually working with. The extra time associated with non byte granular operations refers to the fact that integers need not be byte aligned. For example, performing an ADD will take more time if things aren't aligned on a 4 byte boundary, but for something like a memory copy there will be little difference. If you are working with inherently character data then it's going to be faster to keep it that way than to read everything as integers and bit shift things around.
If you're doing h.264 or MPEG2 encoding the bottleneck is probably going to be CPU time rather than disk i/o in any case.
如果您必须访问整个文件,将其读入内存并在那里进行处理总是更快。当然,它也会浪费内存,并且您必须以某种方式锁定文件,这样您就不会被其他应用程序并发访问,但无论如何,优化都是妥协的结果。如果您跳过文件的(大)部分,内存映射会更快,因为那时您根本不必读取它们。
是的,以 4 字节(甚至 8 字节)粒度访问内存比按字节访问内存要快。这又是一个折衷方案——取决于您之后要对数据做什么,以及您处理 int 中的位的熟练程度,总体上可能不会更快。
至于与优化有关的一切:
If you have to access the whole file, it is always faster to read it to memory and do the processing there. Of course, it's also wasting memory, and you have to lock the file somehow so you won't get concurrent access by some other application, but optimization is about compromises anyway. Memory mapping is faster if you're skipping (large) parts of the file, because you don't have to read them at all then.
Yes, accessing memory at 4-byte (or even 8-byte) granularity is faster than accessing it byte-wise. Again it's a compromise - depending on what you have to do with the data afterwards, and how skilled you are at fiddling with the bits in an int, it might not be faster overall.
As for everything regarding optimization:
这些是连续的比特流 - 您基本上一次消耗它们一位,而无需随机访问。
在这种情况下,您不需要花费大量精力来显式缓冲读取等:操作系统无论如何都会为您缓冲它们。我之前写过H.264解析器,时间完全由解码和操作主导,而不是IO。
我的建议是使用标准库并解析这些比特流。
Flavor就是这样一个解析器,网站甚至包括 MPEG-2 (PS) 和各种 H.264 部分(如 M-Coder)的示例。 Flavor 从类似 C++ 的语言构建本机解析代码;以下是 MPEG-2 PS 规范的引用:
These are sequential bit-streams - you basically consume them one bit at a time without random-access.
You don't need to put a lot of effort into explicitly buffering reads and such in this scenario: the operating system will be buffering them for you anyway. I've written H.264 parsers before, and the time is completely dominated by the decoding and manipulation, not the IO.
My recommendation is to use a standard library and for parsing these bit-streams.
Flavor is such a parser, and the website even includes examples of MPEG-2 (PS) and various H.264 parts like M-Coder. Flavor builds native parsing code from a c++-like language; here's an quote from the MPEG-2 PS spec:
关于从内存中读取的最佳大小,我相信您会喜欢阅读 这篇文章关于内存访问性能和缓存效果。
Regarding to the best size to read from memory, I'm sure you will enjoy reading this post about memory access performance and cache effects.
关于内存映射文件需要考虑的一件事是,大小大于可用地址范围的文件将只能映射文件的一部分。要访问文件的其余部分,需要取消映射第一部分并将下一部分映射到其位置。
由于您正在解码 mpeg 流,因此您可能需要使用双缓冲方法和异步文件读取。它的工作原理如下:
One thing to consider about memory-mapping files is that a file with a size greater than the available address range will only be able to be map a portion of the file. To access the remainder of the file requires the first part to be unmapped and the next part to mapped in its place.
Since you're decoding mpeg streams you may want to use a double buffered approach with asynchronous file reading. It works like this: