dd 与换行符的特性

发布于 2024-11-18 06:08:56 字数 474 浏览 3 评论 0原文

根据我读过的来源， dd 块只是用空格替换换行符。这是正确的还是还有其他事情在起作用。

unix dd 实用程序在像这样使用时：

dd if=foo.log of=bar.log conv=block cbs=2

在像这样的文件上：

12\n34\n56\n78\n9\n

应该给出：

12 34 56 78 9

但它给出

123456789

原文

According to the sources I have read, dd block simply replaces newlines with spaces. Is this correct or are there other things at work.

The unix dd utility when used like so:

dd if=foo.log of=bar.log conv=block cbs=2

on a file like so:

I.e.

12\n34\n56\n78\n9\n

Should give:

12 34 56 78 9

Yet it gives

123456789

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

放肆 2024-11-25 06:08:56

那里的文字有点误导。

由于您要求输出记录大小为 2，这正是您所得到的。如果换行符尚未超过输出记录大小，则它只会被空格替换。

我认为最好这样说：

对于输入中的每一行，输出“cbs”字节，根据需要用足够的空格替换输入换行符。

我最初认为文档可能只是反映了代码中完成事情的方式，大致如下：

对于每一行：
- 将末尾的换行符替换为空格。
- 添加空格以填充所需的记录长度。
- 截断为所需的记录长度。

但事实上，事实并非如此。最新的 dd 源代码有这样的内容（还添加了我自己的注释）：

/* Copy NREAD bytes of BUF, doing conv=block
   (pad newline-terminated records to `conversion_blocksize',
   replacing the newline with trailing spaces).  */

static void copy_with_block (char const *buf, size_t nread) {
    size_t i;

    // For every single character in input buffer.

    for (i = nread; i; i--, buf++) { 
        // If we find a newline.

        if (*buf == newline_character) {
            // If output record no filled up, pad with spaces.

            if (col < conversion_blocksize) {
                size_t j;
                for (j = col; j < conversion_blocksize; j++)
                    output_char (space_character);
            }

            // Regardless, start new output record.

            col = 0;
        } else {
            // No newline.
            // If we hit output limit, increment truncated-lines count.
            // Otherwise only output character if under limit.

            if (col == conversion_blocksize)
                r_truncate++;
            else
                if (col < conversion_blocksize)
                    output_char (*buf);

            // Regardless, increment characters-on-this-line count.

            col++;
        }
    }
}

在此，您显然是使用全局 col 一次处理一个字符来存储你的输出列。它明确指出，一旦您在输入流中找到换行符，它就会被替换为最多转换块大小的空格。

如果在达到转换块大小之前找不到换行符，则所有其他字符都会被丢弃，直到并包括下一个换行符。

The text there is be a little misleading.

Since you've asked for a output record size of two, that's exactly what you're getting. The newline will only be replaced by spaces if it doesn't already exceed the output record size.

I think it would be better to say something like:

For each line in the input, output ‘cbs’ bytes, replacing the input newline with enough spaces as needed.

I originally thought that the docs may simply reflect the way things were done in the code, along the lines of:

for every line:
- replace newline at end with a space.
- add spaces to pad to desired record length.
- truncate to desired record length.

But, in fact, it doesn't. The latest dd source code has this (with my own comments added as well):

/* Copy NREAD bytes of BUF, doing conv=block
   (pad newline-terminated records to `conversion_blocksize',
   replacing the newline with trailing spaces).  */

static void copy_with_block (char const *buf, size_t nread) {
    size_t i;

    // For every single character in input buffer.

    for (i = nread; i; i--, buf++) { 
        // If we find a newline.

        if (*buf == newline_character) {
            // If output record no filled up, pad with spaces.

            if (col < conversion_blocksize) {
                size_t j;
                for (j = col; j < conversion_blocksize; j++)
                    output_char (space_character);
            }

            // Regardless, start new output record.

            col = 0;
        } else {
            // No newline.
            // If we hit output limit, increment truncated-lines count.
            // Otherwise only output character if under limit.

            if (col == conversion_blocksize)
                r_truncate++;
            else
                if (col < conversion_blocksize)
                    output_char (*buf);

            // Regardless, increment characters-on-this-line count.

            col++;
        }
    }
}

In this, you're clearly processing a character at a time using a global col to store your output column. It clearly states that as soon as you find the newline in the input stream, it is replaced with spaces up to the conversion block size.

And if you don't find the newline before you hit the conversion block size, al the other characters are simply discarded, up to and including the next newline.

回复收藏 0 原文

~没有更多了~