C/C++ 将多个字节发送到标准输出的最佳方法

发布于 2024-07-13 13:17:55 字数 518 浏览 10 评论 0原文

分析我的程序和打印函数需要花费大量时间来执行。如何将“原始”字节输出直接发送到 stdout 而不是使用 fwrite，并使其更快（需要同时将 print() 中的所有 9 个字节发送到 stdout）？

void print(){
    unsigned char temp[9];

    temp[0] = matrix[0][0];
    temp[1] = matrix[0][1];
    temp[2] = matrix[0][2];
    temp[3] = matrix[1][0];
    temp[4] = matrix[1][1];
    temp[5] = matrix[1][2];
    temp[6] = matrix[2][0];
    temp[7] = matrix[2][1];
    temp[8] = matrix[2][2];

    fwrite(temp,1,9,stdout);

Matrix

被全局定义为 unsigned char 矩阵[3][3]；

原文

Profiling my program and the function print is taking a lot of time to perform. How can I send "raw" byte output directly to stdout instead of using fwrite, and making it faster (need to send all 9bytes in the print() at the same time to the stdout) ?

void print(){
    unsigned char temp[9];

    temp[0] = matrix[0][0];
    temp[1] = matrix[0][1];
    temp[2] = matrix[0][2];
    temp[3] = matrix[1][0];
    temp[4] = matrix[1][1];
    temp[5] = matrix[1][2];
    temp[6] = matrix[2][0];
    temp[7] = matrix[2][1];
    temp[8] = matrix[2][2];

    fwrite(temp,1,9,stdout);

}

Matrix is defined globally to be a unsigned char matrix[3][3];

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

回首观望 2024-07-20 13:17:55

IO 操作并不便宜。事实上，这是一个阻塞操作，这意味着当您调用write时，操作系统可以抢占您的进程，以允许更多受 CPU 限制的进程在 IO 设备之前运行您正在写入以完成操作。

您可以使用的唯一较低级别的功能（如果您在 *nix 机器上进行开发）是使用原始 write 功能，但即使如此，您的性能也不会比现在快很多现在。简单地说：IO 很昂贵。

回复收藏 0 原文

酸甜透明夹心 2024-07-20 13:17:55

评分最高的答案声称 IO 很慢。

这是一个快速基准测试，具有足够大的缓冲区，可以使操作系统脱离关键性能路径，但前提是您愿意以巨大的模糊接收输出。如果第一个字节的延迟是您的问题，您需要在“dribs”模式下运行。

从 9 字节数组写入 1000 万条记录

在 gcc 4.6.1 下，3GHz CoreDuo 上的 Mint 12 AMD64

   340ms   to /dev/null 
   710ms   to 90MB output file 
 15254ms   to 90MB output file in "dribs" mode

在 clang 3.0 下，在 2.4GHz CoreDuo 上的 FreeBSD 9 AMD64

   450ms   to /dev/null 
   550ms   to 90MB output file on ZFS triple mirror
  1150ms   to 90MB output file on FFS system drive
 22154ms   to 90MB output file in "dribs" mode

如果您能够正确缓冲，那么 IO 不会太慢。

#include <stdio.h> 
#include <assert.h> 
#include <stdlib.h>
#include <string.h>

int main (int argc, char* argv[]) 
{
    int dribs = argc > 1 && 0==strcmp (argv[1], "dribs");
    int err;
    int i; 
    enum { BigBuf = 4*1024*1024 };
    char* outbuf = malloc (BigBuf); 
    assert (outbuf != NULL); 
    err = setvbuf (stdout, outbuf, _IOFBF, BigBuf); // full line buffering 
    assert (err == 0);

    enum { ArraySize = 9 };
    char temp[ArraySize]; 
    enum { Count = 10*1000*1000 }; 

    for (i = 0; i < Count; ++i) {
        fwrite (temp, 1, ArraySize, stdout);    
        if (dribs) fflush (stdout); 
    }
    fflush (stdout);  // seems to be needed after setting own buffer
    fclose (stdout);
    if (outbuf) { free (outbuf); outbuf = NULL; }
}

The top rated answer claims that IO is slow.

Here's a quick benchmark with a sufficiently large buffer to take the OS out of the critical performance path, but only if you're willing to receive your output in giant blurps. If latency to first byte is your problem, you need to run in "dribs" mode.

Write 10 million records from a nine byte array

Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1

   340ms   to /dev/null 
   710ms   to 90MB output file 
 15254ms   to 90MB output file in "dribs" mode

FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0

   450ms   to /dev/null 
   550ms   to 90MB output file on ZFS triple mirror
  1150ms   to 90MB output file on FFS system drive
 22154ms   to 90MB output file in "dribs" mode

There's nothing slow about IO if you can afford to buffer properly.

#include <stdio.h> 
#include <assert.h> 
#include <stdlib.h>
#include <string.h>

int main (int argc, char* argv[]) 
{
    int dribs = argc > 1 && 0==strcmp (argv[1], "dribs");
    int err;
    int i; 
    enum { BigBuf = 4*1024*1024 };
    char* outbuf = malloc (BigBuf); 
    assert (outbuf != NULL); 
    err = setvbuf (stdout, outbuf, _IOFBF, BigBuf); // full line buffering 
    assert (err == 0);

    enum { ArraySize = 9 };
    char temp[ArraySize]; 
    enum { Count = 10*1000*1000 }; 

    for (i = 0; i < Count; ++i) {
        fwrite (temp, 1, ArraySize, stdout);    
        if (dribs) fflush (stdout); 
    }
    fflush (stdout);  // seems to be needed after setting own buffer
    fclose (stdout);
    if (outbuf) { free (outbuf); outbuf = NULL; }
}

回复收藏 0 原文

惯饮孤独 2024-07-20 13:17:55

您可以执行的最原始的输出形式可能是 write 系统调用，例如

write (1, matrix, 9);

1 是标准输出的文件描述符（0 是标准输入，2 是标准错误）。您的标准输出的写入速度只能与另一端（即终端或您正在通过管道传输的程序）读取它的速度一样快，这可能会相当慢。

我不是 100% 确定，但您可以尝试在 fd 1 上设置非阻塞 IO（使用 fcntl），并希望操作系统能为您缓冲它，直到它可以被另一端使用。已经有一段时间了，但我认为它就像这个

fcntl (1, F_SETFL, O_NONBLOCK);

YMMV 一样。如果我的语法错误，请纠正我，正如我所说，已经有一段时间了。

The rawest form of output you can do is the probable the write system call, like this

write (1, matrix, 9);

1 is the file descriptor for standard out (0 is standard in, and 2 is standard error). Your standard out will only write as fast as the one reading it at the other end (i.e. the terminal, or the program you're pipeing into) which might be rather slow.

I'm not 100% sure, but you could try setting non-blocking IO on fd 1 (using fcntl) and hope the OS will buffer it for you until it can be consumed by the other end. It's been a while, but I think it works like this

fcntl (1, F_SETFL, O_NONBLOCK);

YMMV though. Please correct me if I'm wrong on the syntax, as I said, it's been a while.

回复收藏 0 原文

纸伞微斜 2024-07-20 13:17:55

也许您的问题不是 fwrite() 慢，而是它被缓冲了。
尝试在 fwrite() 之后调用 fflush(stdout)。

这一切实际上取决于您在这种情况下对慢的定义。

回复收藏 0 原文

无敌元气妹 2024-07-20 13:17:55

尽管 iostream 的打印速度确实很慢，但所有打印都相当慢。

你最好的选择是使用 printf，类似于：

printf("%c%c%c%c%c%c%c%c%c\n", matrix[0][0], matrix[0][1], matrix[0][2], matrix[1][0],
  matrix[1][1], matrix[1][2], matrix[2][0], matrix[2][1], matrix[2][2]);

All printing is fairly slow, although iostreams are really slow for printing.

Your best bet would be to use printf, something along the lines of:

printf("%c%c%c%c%c%c%c%c%c\n", matrix[0][0], matrix[0][1], matrix[0][2], matrix[1][0],
  matrix[1][1], matrix[1][2], matrix[2][0], matrix[2][1], matrix[2][2]);

回复收藏 0 原文

白首有我共你 2024-07-20 13:17:55

正如每个人都指出的那样，紧密内循环中的 IO 成本很高。当需要调试它时，我通常最终会根据某些标准进行 Matrix 的条件cout。

如果您的应用程序是控制台应用程序，请尝试将其重定向到文件，这将比控制台刷新快得多。例如app.exe> 矩阵转储.txt

回复收藏 0 原文

残花月 2024-07-20 13:17:55

问题是：

fwrite(matrix,1,9,stdout);

一维数组和二维数组占用相同的内存。

What's wrong with:

fwrite(matrix,1,9,stdout);

both the one and the two dimensional arrays take up the same memory.

回复收藏 0 原文

汹涌人海 2024-07-20 13:17:55

尝试运行该程序两次。一次有输出，一次没有。你会注意到，总的来说，没有 io 的速度是最快的。另外，您可以分叉进程（或创建一个线程），一个写入文件（stdout），一个执行操作。

回复收藏 0 原文

静待花开 2024-07-20 13:17:55

所以首先，不要在每个条目上打印。基本上我想说的是不要那样做。

for(int i = 0; i<100; i++){
    printf("Your stuff");
}

相反，在堆栈或堆上分配一个缓冲区，并在那里存储您的信息，然后将该缓冲区扔到标准输出中，只是这样，

char *buffer = malloc(sizeof(100));
for(int i = 100; i<100; i++){
    char[i] = 1; //your 8 byte value goes here
}

//once you are done print it to a ocnsole with 
write(1, buffer, 100);

但在您的情况下，只需使用 write(1, temp, 9);

So first, don't print on every entry. Basically what i am saying is do not do like that.

for(int i = 0; i<100; i++){
    printf("Your stuff");
}

instead allocate a buffer either on stack or on heap, and store you infomration there and then just throw this bufffer into stdout, just liek that

char *buffer = malloc(sizeof(100));
for(int i = 100; i<100; i++){
    char[i] = 1; //your 8 byte value goes here
}

//once you are done print it to a ocnsole with 
write(1, buffer, 100);

but in your case, just use write(1, temp, 9);

回复收藏 0 原文

狂之美人 2024-07-20 13:17:55

我非常确定您可以通过增加缓冲区大小来提高输出性能。所以你有更少的 fwrite 调用。写可能会更快，但我不确定。只需尝试一下：

❯ yes | dd of=/dev/null count=1000000 
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 2.18338 s, 234 MB/s

> yes | dd of=/dev/null count=100000 bs=50KB iflag=fullblock
100000+0 records in
100000+0 records out
5000000000 bytes (5.0 GB, 4.7 GiB) copied, 2.63986 s, 1.9 GB/s

这同样适用于您的代码。最近几天的一些测试表明，良好的缓冲区大小可能约为 1 << 12(＝4096)和1＜＜16(＝65535)字节。

I am pretty sure you can increase the output performance by increasing the buffer size. So you have less fwrite calls. write might be faster but I am not sure. Just try this:

❯ yes | dd of=/dev/null count=1000000 
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 2.18338 s, 234 MB/s

> yes | dd of=/dev/null count=100000 bs=50KB iflag=fullblock
100000+0 records in
100000+0 records out
5000000000 bytes (5.0 GB, 4.7 GiB) copied, 2.63986 s, 1.9 GB/s

The same applies to your code. Some tests during the last days show that probably good buffer sizes are around 1 << 12 (=4096) and 1<<16 (=65535) bytes.

回复收藏 0 原文