从文件中读取而不是逐行读取

发布于 2024-07-25 02:39:21 字数 437 浏览 6 评论 0原文

将 QTextStream 分配给 QFile 并逐行读取它很容易并且工作正常,但我想知道是否可以通过首先将文件存储在内存中来提高性能然后逐行处理。

使用 sysinternals 中的 FileMon 时,我遇到了读取文件的情况以 16KB 为单位,并且由于我要处理的文件不是那么大(~2MB,但很多!),将它们加载到内存中将是一个很好的尝试。

有什么想法我该怎么做? QFile 是从 QIODevice 继承的,它允许我将 ReadAll()QByteArray 中,但是如何继续并将其分成几行?

Assigning a QTextStream to a QFile and reading it line-by-line is easy and works fine, but I wonder if the performance can be inreased by first storing the file in memory and then processing it line-by-line.

Using FileMon from sysinternals, I've encountered that the file is read in chunks of 16KB and since the files I've to process are not that big (~2MB, but many!), loading them into memory would be a nice thing to try.

Any ideas how can I do so? QFile is inhereted from QIODevice, which allows me to ReadAll() it into QByteArray, but how to proceed then and divide it into lines?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

蛮可爱 2024-08-01 02:39:21

QTextStream有一个ReadAll函数:

http://doc.qt.io/qt- 4.8/qtextstream.html#readAll

当然这就是您所需要的?

或者您可以将所有内容读入 QByteArray ,并且 QTextStream 可以采用作为输入而不是 QFile。

QTextStream has a ReadAll function:

http://doc.qt.io/qt-4.8/qtextstream.html#readAll

Surely that's all you need?

Or you could read all into the QByteArray and QTextStream can take that as an input instead of a QFile.

芸娘子的小脾气 2024-08-01 02:39:21

当心。 有很多影响需要考虑。

对于涉及的字符串处理(或您对文件执行的任何操作),只要文件缓冲合理,从内存中执行和从文件中逐行执行之间可能没有性能差异。

实际上调用操作系统进行低级读取是非常昂贵的。 这就是我们缓冲 I/O 的原因。 对于较小的 I/O 大小,调用的开销占主导地位。 因此,一次读取 64 字节的效率可能是一次读取 256 字节的效率的 1/4。 (我在这里谈论的是 read(),而不是 fgets() 或 fread(),两者都是缓冲的。)

在某个点上,物理 I/O 所需的时间开始占主导地位,并且当性能不再起作用时对于更大的缓冲区,增加很多,您已经找到了缓冲区大小。 非常旧的数据点:7MHz Amiga 500、100MB SCSI 硬盘(A590+Quantum):我的 I/O 性能实际上仅在 256KB 缓冲区大小时达到最大值。 与处理器相比,该磁盘速度更快! (计算机只有 3MB 的 RAM。256KB 是一个很大的缓冲区!)

但是,你可能拥有太多的好东西。 一旦文件进入内存,操作系统就可以在空闲时将该文件分页回磁盘。 如果这样做,您就失去了缓冲的任何好处。 如果您的缓冲区太大,那么在某些负载情况下可能会发生这种情况,并且您的性能会下降。 因此,请仔细考虑您的运行时环境,并在需要时限制内存占用。

另一种方法是使用 mmap() 将文件映射到内存中。 现在,操作系统不会将您的文件分页 - 相反,它不会分页,或者如果它需要内存,它会丢弃缓存在核心中的任何文件片段。 但它不需要向交换空间写入任何内容 - 它有可用的文件。 不过,我不确定这是否会带来更好的性能,因为以大块的形式进行 I/O 仍然更好,而且虚拟内存往往以页面大小的块的形式移动内容。 一些内存管理器可以很好地分块移动页面以增加 I/O 带宽和预取页面。 但我还没有真正详细研究过这个。

首先让你的程序正常运行。 然后优化。

Be careful. There are many effects to consider.

For the string processing involved (or whatever you are doing with the file) there is likely no performance difference between doing it from memory and doing it from a file line by line provided that the file buffering is reasonable.

Actually calling your operating system to do a low level read is VERY expensive. That's why we have buffered I/O. For small I/O sizes the overhead of the call dominates. So, reading 64 bytes at a time is likely 1/4 as efficient as reading 256 bytes at a time. (And I am talking about read() here, not fgets() or fread() both of which are buffered.)

At a certain point the time required for the physical I/O starts to dominate, and when the performance doesn't increase much for a larger buffer you have found your buffer size. Very old data point: 7MHz Amiga 500, 100MB SCSI hard disk (A590+Quantum): my I/O performance really only hit maximum with a 256KB buffer size. Compared to the processor, that disk was FAST!!! (The computer had only 3MB of RAM. 256KB is a BIG buffer!)

However, you can have too much of a good thing. Once your file is in memory, the OS can page that file back out to disk at its leisure. And if it does so, you've lost any benefit of buffering. If you make your buffers too big then this may happen under certain load situations and your performance goes down the toilet. So consider your runtime environment carefully, and limit memory footprint if need be.

An alternative is to use mmap() to map the file into memory. Now the OS won't page your file out - rather, it will simply not page in, or if it needs memory it will discard any pieces of file cached in core. But it won't need to write anything to swap space - it has the file available. I'm not sure if this would result in better performance, however, because it's still better to do I/O in big chunks, and virtual memory tends to move things in page-sized chunks. Some memory managers may do a decent job of moving pages in chunks to increase I/O bandwidth, and prefetching pages. But I haven't really studied this in detail.

Get your program working correctly first. Then optimize.

在你怀里撒娇 2024-08-01 02:39:21

只要您不在每次读取一行时打开和关闭文件,首先读取整个文件或在读取时处理它之间就不会有性能差异(除非当您有要使用的整个文件)。 如果你仔细想想,这两种方法实际上都在做同样的事情(读取整个文件一次)。

As long as you don't open and close the file every time you read a single line, there should be no performance difference between reading in the entire file first or processing it as you read it (unless the processing part is faster when you have the entire file to work with). If you think about it, both ways are actually doing the same thing (reading the entire file once).

抹茶夏天i‖ 2024-08-01 02:39:21

您可以

QTextStream ( QIODevice * device )

QTextStream类提供了一个方便的接口用于读取和
书写文字。

QTextStream 可以在 QIODevice、QByteArray 或 QString 上操作。
使用QTextStream的流式运算符,您可以方便地读取和
写下单词、行和数字。

you may

QTextStream ( QIODevice * device )

The QTextStream class provides a convenient interface for reading and
writing text.

QTextStream can operate on a QIODevice, a QByteArray or a QString.
Using QTextStream's streaming operators, you can conveniently read and
write words, lines and numbers.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文