C/C++ 将多个字节发送到标准输出的最佳方法
分析我的程序和打印函数需要花费大量时间来执行。 如何将“原始”字节输出直接发送到 stdout 而不是使用 fwrite,并使其更快(需要同时将 print() 中的所有 9 个字节发送到 stdout)?
void print(){
unsigned char temp[9];
temp[0] = matrix[0][0];
temp[1] = matrix[0][1];
temp[2] = matrix[0][2];
temp[3] = matrix[1][0];
temp[4] = matrix[1][1];
temp[5] = matrix[1][2];
temp[6] = matrix[2][0];
temp[7] = matrix[2][1];
temp[8] = matrix[2][2];
fwrite(temp,1,9,stdout);
Matrix
被全局定义为 unsigned char 矩阵[3][3];
Profiling my program and the function print is taking a lot of time to perform. How can I send "raw" byte output directly to stdout instead of using fwrite, and making it faster (need to send all 9bytes in the print() at the same time to the stdout) ?
void print(){
unsigned char temp[9];
temp[0] = matrix[0][0];
temp[1] = matrix[0][1];
temp[2] = matrix[0][2];
temp[3] = matrix[1][0];
temp[4] = matrix[1][1];
temp[5] = matrix[1][2];
temp[6] = matrix[2][0];
temp[7] = matrix[2][1];
temp[8] = matrix[2][2];
fwrite(temp,1,9,stdout);
}
Matrix is defined globally to be a unsigned char matrix[3][3];
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
IO 操作并不便宜。 事实上,这是一个阻塞操作,这意味着当您调用
write
时,操作系统可以抢占您的进程,以允许更多受 CPU 限制的进程在 IO 设备之前运行您正在写入以完成操作。您可以使用的唯一较低级别的功能(如果您在 *nix 机器上进行开发)是使用原始
write
功能,但即使如此,您的性能也不会比现在快很多现在。 简单地说:IO 很昂贵。IO is not an inexpensive operation. It is, in fact, a blocking operation, meaning that the OS can preempt your process when you call
write
to allow more CPU-bound processes to run, before the IO device you're writing to completes the operation.The only lower level function you can use (if you're developing on a *nix machine), is to use the raw
write
function, but even then your performance will not be that much faster than it is now. Simply put: IO is expensive.评分最高的答案声称 IO 很慢。
这是一个快速基准测试,具有足够大的缓冲区,可以使操作系统脱离关键性能路径,但前提是您愿意以巨大的模糊接收输出。 如果第一个字节的延迟是您的问题,您需要在“dribs”模式下运行。
从 9 字节数组写入 1000 万条记录
在 gcc 4.6.1 下,3GHz CoreDuo 上的 Mint 12 AMD64
在 clang 3.0 下,在 2.4GHz CoreDuo 上的 FreeBSD 9 AMD64
如果您能够正确缓冲,那么 IO 不会太慢。
The top rated answer claims that IO is slow.
Here's a quick benchmark with a sufficiently large buffer to take the OS out of the critical performance path, but only if you're willing to receive your output in giant blurps. If latency to first byte is your problem, you need to run in "dribs" mode.
Write 10 million records from a nine byte array
Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1
FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0
There's nothing slow about IO if you can afford to buffer properly.
您可以执行的最原始的输出形式可能是 write 系统调用,例如
1 是标准输出的文件描述符(0 是标准输入,2 是标准错误)。 您的标准输出的写入速度只能与另一端(即终端或您正在通过管道传输的程序)读取它的速度一样快,这可能会相当慢。
我不是 100% 确定,但您可以尝试在 fd 1 上设置非阻塞 IO(使用
fcntl
),并希望操作系统能为您缓冲它,直到它可以被另一端使用。 已经有一段时间了,但我认为它就像这个YMMV 一样。 如果我的语法错误,请纠正我,正如我所说,已经有一段时间了。
The rawest form of output you can do is the probable the
write
system call, like this1 is the file descriptor for standard out (0 is standard in, and 2 is standard error). Your standard out will only write as fast as the one reading it at the other end (i.e. the terminal, or the program you're pipeing into) which might be rather slow.
I'm not 100% sure, but you could try setting non-blocking IO on fd 1 (using
fcntl
) and hope the OS will buffer it for you until it can be consumed by the other end. It's been a while, but I think it works like thisYMMV though. Please correct me if I'm wrong on the syntax, as I said, it's been a while.
也许您的问题不是 fwrite() 慢,而是它被缓冲了。
尝试在 fwrite() 之后调用 fflush(stdout)。
这一切实际上取决于您在这种情况下对慢的定义。
Perhaps your problem is not that fwrite() is slow, but that it is buffered.
Try calling fflush(stdout) after the fwrite().
This all really depends on your definition of slow in this context.
尽管 iostream 的打印速度确实很慢,但所有打印都相当慢。
你最好的选择是使用 printf,类似于:
All printing is fairly slow, although iostreams are really slow for printing.
Your best bet would be to use printf, something along the lines of:
正如每个人都指出的那样,紧密内循环中的 IO 成本很高。 当需要调试它时,我通常最终会根据某些标准进行 Matrix 的条件cout。
如果您的应用程序是控制台应用程序,请尝试将其重定向到文件,这将比控制台刷新快得多。 例如app.exe> 矩阵转储.txt
As everyone has pointed out IO in tight inner loop is expensive. I have normally ended up doing conditional cout of Matrix based on some criteria when required to debug it.
If your app is console app then try redirecting it to a file, it will be lot faster than doing console refreshes. e.g app.exe > matrixDump.txt
问题是:
一维数组和二维数组占用相同的内存。
What's wrong with:
both the one and the two dimensional arrays take up the same memory.
尝试运行该程序两次。 一次有输出,一次没有。 你会注意到,总的来说,没有 io 的速度是最快的。 另外,您可以分叉进程(或创建一个线程),一个写入文件(stdout),一个执行操作。
Try running the program twice. Once with output and once without. You will notice that overall, the one without the io is the fastest. Also, you could fork the process (or create a thread), one writing to a file(stdout), and one doing the operations.
所以首先,不要在每个条目上打印。 基本上我想说的是不要那样做。
相反,在堆栈或堆上分配一个缓冲区,并在那里存储您的信息,然后将该缓冲区扔到标准输出中,只是这样,
但在您的情况下,只需使用 write(1, temp, 9);
So first, don't print on every entry. Basically what i am saying is do not do like that.
instead allocate a buffer either on stack or on heap, and store you infomration there and then just throw this bufffer into stdout, just liek that
but in your case, just use
write(1, temp, 9);
我非常确定您可以通过增加缓冲区大小来提高输出性能。 所以你有更少的 fwrite 调用。 写可能会更快,但我不确定。 只需尝试一下:
vs
这同样适用于您的代码。 最近几天的一些测试表明,良好的缓冲区大小可能约为 1 << 12(=4096)和1<<16(=65535)字节。
I am pretty sure you can increase the output performance by increasing the buffer size. So you have less fwrite calls. write might be faster but I am not sure. Just try this:
vs
The same applies to your code. Some tests during the last days show that probably good buffer sizes are around 1 << 12 (=4096) and 1<<16 (=65535) bytes.
您可以简单地:
printf 更具 C 风格。
然而,IO 操作的成本很高,因此请明智地使用它们。
You can simply:
printf is more C-Style.
Yet, IO operations are costly, so use them wisely.