在 C 中将制表符扩展到空格?

发布于 2024-08-04 19:59:59 字数 447 浏览 7 评论 0原文

我需要扩展输入行中的制表符,使它们成为空格(宽度为 8 列)。我用以前的代码尝试过,我将每行超过 10 个字符的最后一个空格替换为 '\n' 以创建新行。 C中有没有办法让制表符变成8个空格以扩展它们?我的意思是我确信这很简单,但我似乎无法理解。

这是我的代码:

int v = 0;
int w = 0;
int tab;
extern char line[];

while (v < length) {

   if(line[v] == '\t')
      tab = v;

   if (w == MAXCHARS) {
      // THIS IS WHERE I GET STUCK
      line[tab] = ' ';
      // set y to 0, so loop starts over
      w = 0;
   }
   ++v;
   ++w;
}

I need to expand tabs in an input line, so that they are spaces (with a width of 8 columns). I tried it with a previous code I had replacing the last space in every line greater than 10 characters with a '\n' to make a new line. Is there an way in C to make tabs 8 spaces in order to expand them? I mean I am sure it is simple, I just can't seem to get it.

Here's my code:

int v = 0;
int w = 0;
int tab;
extern char line[];

while (v < length) {

   if(line[v] == '\t')
      tab = v;

   if (w == MAXCHARS) {
      // THIS IS WHERE I GET STUCK
      line[tab] = ' ';
      // set y to 0, so loop starts over
      w = 0;
   }
   ++v;
   ++w;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

夜夜流光相皎洁 2024-08-11 19:59:59

这实际上并不是一个关于 C 语言的问题;而是一个关于 C 语言的问题。这是一个关于找到正确算法的问题——您可以在任何语言中使用该算法。

无论如何,如果不重新分配 line[] 来指向更大的缓冲区,你根本无法做到这一点(除非它是一个很大的固定长度,在这种情况下你需要担心溢出);当您扩展选项卡时,您需要更多内存来存储新的、更大的行,因此您尝试执行的字符替换根本不起作用。

我的建议:与其尝试就地操作(或者甚至尝试在内存中操作),我建议将其编写为过滤器 - 从 stdin 读取并一次向 stdout 写入一个字符;这样你就不需要担心内存分配或释放或行[]长度的变化。

如果此代码使用的上下文要求它在内存中操作,请考虑实现类似于realloc()的API,其中返回一个新指针;如果您不需要更改正在处理的字符串的长度,您可以简单地保留原始内存区域,但如果您确实需要调整它的大小,则可以使用该选项。

This isn't really a question about the C language; it's a question about finding the right algorithm -- you could use that algorithm in any language.

Anyhow, you can't do this at all without reallocating line[] to point at a larger buffer (unless it's a large fixed length, in which case you need to be worried about overflows); as you're expanding the tabs, you need more memory to store the new, larger lines, so character replacement such as you're trying to do simply won't work.

My suggestion: Rather than trying to operate in place (or trying to operate in memory, even) I would suggest writing this as a filter -- reading from stdin and writing to stdout one character at a time; that way you don't need to worry about memory allocation or deallocation or the changing length of line[].

If the context this code is being used in requires it to operate in memory, consider implementing an API similar to realloc(), wherein you return a new pointer; if you don't need to change the length of the string being handled you can simply keep the original region of memory, but if you do need to resize it, the option is available.

流年里的时光 2024-08-11 19:59:59

您需要一个单独的缓冲区来写入输出,因为它通常比输入长:

void detab(char* in, char* out, size_t max_len) {
    size_t i = 0;
    while (*in && i < max_len - 1) {
        if (*in == '\t') {
            in++;
            out[i++] = ' ';
            while (i % 8 && i < max_len - 1) {
                out[i++] = ' ';
            }
        } else {
            out[i++] = *in++;
        }
    }

    out[i] = 0;
}

您必须为 out 预先分配足够的空间(在最坏的情况下可能是 8 * strlen (in) + 1),并且 out 不能与 in 相同。

编辑:根据 Jonathan Leffler 的建议,max_len 参数现在可确保我们避免缓冲区溢出。生成的字符串将始终以 null 结尾,即使为了避免此类溢出而将其剪短。 (我还重命名了该函数,并将 int 更改为 size_t 以增加正确性:)。)

You need a separate buffer to write the output to, since it will in general be longer than the input:

void detab(char* in, char* out, size_t max_len) {
    size_t i = 0;
    while (*in && i < max_len - 1) {
        if (*in == '\t') {
            in++;
            out[i++] = ' ';
            while (i % 8 && i < max_len - 1) {
                out[i++] = ' ';
            }
        } else {
            out[i++] = *in++;
        }
    }

    out[i] = 0;
}

You must preallocate enough space for out (which in the worst case could be 8 * strlen(in) + 1), and out cannot be the same as in.

EDIT: As suggested by Jonathan Leffler, the max_len parameter now makes sure we avoid buffer overflows. The resulting string will always be null-terminated, even if it is cut short to avoid such an overflow. (I also renamed the function, and changed int to size_t for added correctness :).)

七禾 2024-08-11 19:59:59

我可能会做这样的事情:

  1. 遍历字符串一次,只计算制表符(如果您还不知道的话,还计算字符串长度)。
  2. 分配 original_size + 7 * number_of_tabs 字节的内存(其中original_size 计算空字节)。
  3. 再次迭代该字符串,将每个非制表符字节复制到新内存,并为每个制表符插入 8 个空格。

如果要就地替换而不是创建新字符串,则必须确保传入的指针指向具有足够内存来存储新字符串的位置(该位置将比原始字符串长)因为 8 个空格或 7 个字节多了一个制表符)。

I would probably do something like this:

  1. Iterate through the string once, only counting the tabs (and the string length if you don't already know that).
  2. Allocate original_size + 7 * number_of_tabs bytes of memory (where original_size counts the null byte).
  3. Iterate through the string another time, copying every non-tab byte to the new memory and inserting 8 spaces for every tab.

If you want to do the replacement in-place instead of creating a new string, you'll have to make sure that the passed-in pointer points to a location with enough memory to store the new string (which will be longer than the original because 8 spaces or 7 bytes more than one tab).

还如梦归 2024-08-11 19:59:59

这是一个可重入的递归版本,它自动分配正确大小的缓冲区:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct state
{
    char *dest;
    const char *src;
    size_t tab_size;
    size_t size;
    _Bool expand;
};

static void recexp(struct state *state, size_t di, size_t si)
{
    size_t start = si;
    size_t pos = si;

    for(; state->src[pos]; ++pos)
    {
        if(state->src[pos] == '\n') start = pos + 1;
        else if(state->src[pos] == '\t')
        {
            size_t str_len = pos - si;
            size_t tab_len = state->tab_size - (pos - start) % state->tab_size;

            recexp(state, di + str_len + tab_len, pos + 1);
            if(state->dest)
            {
                memcpy(state->dest + di, state->src + si, str_len);
                memset(state->dest + di + str_len, ' ', tab_len);
            }

            return;
        }
    }

    state->size = di + pos - si + 1;
    if(state->expand && !state->dest) state->dest = malloc(state->size);
    if(state->dest)
    {
        memcpy(state->dest + di, state->src + si, pos - si);
        state->dest[state->size - 1] = 0;
    }
}

size_t expand_tabs(char **dest, const char *src, size_t tab_size)
{
    struct state state = { dest ? *dest : NULL, src, tab_size, 0, dest };
    recexp(&state, 0, 0);
    if(dest) *dest = state.dest;
    return state.size;
}

int main(void)
{
    char *expansion = NULL; // must be `NULL` for automatic allocation
    size_t size = expand_tabs(&expansion,
        "spam\teggs\tfoo\tbar\nfoobar\tquux", 4);
    printf("expanded size: %lu\n", (unsigned long)size);
    puts(expansion);
}

如果使用 dest == NULL 调用 expand_tabs(),则该函数将返回展开的大小string,但实际上没有进行扩展;如果 dest != NULL*dest == NULL,则将分配正确大小的缓冲区,并且必须由程序员释放;如果 dest != NULL*dest != NULL,则扩展后的字符串将被放入 *dest 中,因此请确保提供的缓冲区是足够大。

Here's a reentrant, recursive version which automatically allocates a buffer of correct size:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct state
{
    char *dest;
    const char *src;
    size_t tab_size;
    size_t size;
    _Bool expand;
};

static void recexp(struct state *state, size_t di, size_t si)
{
    size_t start = si;
    size_t pos = si;

    for(; state->src[pos]; ++pos)
    {
        if(state->src[pos] == '\n') start = pos + 1;
        else if(state->src[pos] == '\t')
        {
            size_t str_len = pos - si;
            size_t tab_len = state->tab_size - (pos - start) % state->tab_size;

            recexp(state, di + str_len + tab_len, pos + 1);
            if(state->dest)
            {
                memcpy(state->dest + di, state->src + si, str_len);
                memset(state->dest + di + str_len, ' ', tab_len);
            }

            return;
        }
    }

    state->size = di + pos - si + 1;
    if(state->expand && !state->dest) state->dest = malloc(state->size);
    if(state->dest)
    {
        memcpy(state->dest + di, state->src + si, pos - si);
        state->dest[state->size - 1] = 0;
    }
}

size_t expand_tabs(char **dest, const char *src, size_t tab_size)
{
    struct state state = { dest ? *dest : NULL, src, tab_size, 0, dest };
    recexp(&state, 0, 0);
    if(dest) *dest = state.dest;
    return state.size;
}

int main(void)
{
    char *expansion = NULL; // must be `NULL` for automatic allocation
    size_t size = expand_tabs(&expansion,
        "spam\teggs\tfoo\tbar\nfoobar\tquux", 4);
    printf("expanded size: %lu\n", (unsigned long)size);
    puts(expansion);
}

If expand_tabs() is called with dest == NULL, the function will return the size of the expanded string, but no expansion is actually done; if dest != NULL but *dest == NULL, a buffer of correct size will be allocated and must be deallocated by the programmer; if dest != NULL and *dest != NULL, the expanded string will be put into *dest, so make sure the supplied buffer is large enough.

只有影子陪我不离不弃 2024-08-11 19:59:59

未经测试,但类似这样的东西应该可以工作:

int v = 0;
int tab;
extern char line[];

while (v < length){
  if (line[v] == '\t') {
    tab = (v % TAB_WIDTH) || TAB_WIDTH;
    /* I'm assuming MAXCHARS is the size of your array. You either need
     * to bail, or resize the array if the expanding the tab would make
     * the string too long. */
    assert((length + tab) < MAXCHARS);
    if (tab != 1) {
      memmove(line + v + tab - 1, line + v, length - v + 1);
    }
    memset(line + v, ' ', tab);
    length += tab - 1;
    v += tab;
  } else {
    ++v;
  }
}

请注意,这是 O(n*m),其中 n 是行大小,m 是选项卡数量。这在实践中可能不是问题。

Untested, but something like this should work:

int v = 0;
int tab;
extern char line[];

while (v < length){
  if (line[v] == '\t') {
    tab = (v % TAB_WIDTH) || TAB_WIDTH;
    /* I'm assuming MAXCHARS is the size of your array. You either need
     * to bail, or resize the array if the expanding the tab would make
     * the string too long. */
    assert((length + tab) < MAXCHARS);
    if (tab != 1) {
      memmove(line + v + tab - 1, line + v, length - v + 1);
    }
    memset(line + v, ' ', tab);
    length += tab - 1;
    v += tab;
  } else {
    ++v;
  }
}

Note that this is O(n*m) where n is the line size and m is the number of tabs. That probably isn't an issue in practice.

秋日私语 2024-08-11 19:59:59

有多种方法可以将字符串中的制表符转换为 1-8 个空格。有一些低效的方法可以进行原位扩展,但最简单的处理方法是使用一个函数来获取输入字符串和一个足以容纳扩展字符串的单独输出缓冲区。如果输入是 6 个制表符加上一个 X 和一个换行符(8 个字符 + 终止空),则输出将是 48 个空格、X 和一个换行符(50 个字符 + 终止空) - 因此您可能需要比输入缓冲区。

#include <stddef.h>
#include <assert.h>

static int detab(const char *str, char *buffer, size_t buflen)
{
    char *end = buffer + buflen;
    char *dst = buffer;
    const char *src = str;
    char c;

    assert(buflen > 0);
    while ((c = *src++) != '\0' && dst < end)
    {
         if (c != '\t')
             *dst++ = c;
         else
         {
             do
             {
                 *dst++ = ' ';
             } while (dst < end && (dst - buffer) % 8 != 0);
         }
    }
    if (dst < end)
    {
        *dst = '\0';
        return(dst - buffer);
    }
    else
        return -1;
}

#ifdef TEST
#include <stdio.h>
#include <string.h>

#ifndef TEST_INPUT_BUFFERSIZE
#define TEST_INPUT_BUFFERSIZE 4096
#endif /* TEST_INPUT_BUFFERSIZE */
#ifndef TEST_OUTPUT_BUFFERSIZE
#define TEST_OUTPUT_BUFFERSIZE (8 * TEST_INPUT_BUFFERSIZE)
#endif /* TEST_OUTPUT_BUFFERSIZE */

int main(void)
{
     char ibuff[TEST_INPUT_BUFFERSIZE];
     char obuff[TEST_OUTPUT_BUFFERSIZE];

     while (fgets(ibuff, sizeof(ibuff), stdin) != 0)
     {
          if (detab(ibuff, obuff, sizeof(obuff)) >= 0)
              fputs(obuff, stdout);
          else
              fprintf(stderr, "Failed to detab input line: <<%.*s>>\n",
                      (int)(strlen(ibuff) - 1), ibuff);
     }
     return(0);
 }
 #endif /* TEST */

该测试的最大问题是很难证明它可以正确处理输出缓冲区的溢出。这就是为什么缓冲区大小有两个“#define”序列 - 实际工作的默认值非常大,压力测试的缓冲区大小可独立配置。如果源文件是 dt.c,请使用如下编译:

 make CFLAGS="-DTEST -DTEST_INPUT_BUFFERSIZE=32 -DTEST_OUTPUT_BUFFERSIZE=32" dt

如果要在此文件外部使用“detab()”函数,则需要创建一个标头来包含其声明,并且当然,您可以在这段代码中包含该标头,并且该函数不会是静态的。

There are a myriad ways to convert tabs in a string into 1-8 spaces. There are inefficient ways to do the expansion in-situ, but the easiest way to handle it is to have a function that takes the input string and a separate output buffer that is big enough for an expanded string. If the input is 6 tabs plus an X and a newline (8 characters + terminating null), the output would be 48 blanks, X, and a newline (50 characters + terminating null) - so you might need a much bigger output buffer than input buffer.

#include <stddef.h>
#include <assert.h>

static int detab(const char *str, char *buffer, size_t buflen)
{
    char *end = buffer + buflen;
    char *dst = buffer;
    const char *src = str;
    char c;

    assert(buflen > 0);
    while ((c = *src++) != '\0' && dst < end)
    {
         if (c != '\t')
             *dst++ = c;
         else
         {
             do
             {
                 *dst++ = ' ';
             } while (dst < end && (dst - buffer) % 8 != 0);
         }
    }
    if (dst < end)
    {
        *dst = '\0';
        return(dst - buffer);
    }
    else
        return -1;
}

#ifdef TEST
#include <stdio.h>
#include <string.h>

#ifndef TEST_INPUT_BUFFERSIZE
#define TEST_INPUT_BUFFERSIZE 4096
#endif /* TEST_INPUT_BUFFERSIZE */
#ifndef TEST_OUTPUT_BUFFERSIZE
#define TEST_OUTPUT_BUFFERSIZE (8 * TEST_INPUT_BUFFERSIZE)
#endif /* TEST_OUTPUT_BUFFERSIZE */

int main(void)
{
     char ibuff[TEST_INPUT_BUFFERSIZE];
     char obuff[TEST_OUTPUT_BUFFERSIZE];

     while (fgets(ibuff, sizeof(ibuff), stdin) != 0)
     {
          if (detab(ibuff, obuff, sizeof(obuff)) >= 0)
              fputs(obuff, stdout);
          else
              fprintf(stderr, "Failed to detab input line: <<%.*s>>\n",
                      (int)(strlen(ibuff) - 1), ibuff);
     }
     return(0);
 }
 #endif /* TEST */

The biggest trouble with this test is that it is hard to demonstrate that it handles overflows in the output buffer properly. That's why there are the two '#define' sequences for the buffer sizes - with very large defaults for real work and independently configurable buffer sizes for stress testing. If the source file is dt.c, use a compilation like this:

 make CFLAGS="-DTEST -DTEST_INPUT_BUFFERSIZE=32 -DTEST_OUTPUT_BUFFERSIZE=32" dt

If the 'detab()' function is to be used outside this file, you'd create a header to contain its declaration, and you'd include that header in this code, and the function would not be static, of course.

最单纯的乌龟 2024-08-11 19:59:59

下面的代码将 malloc(3) 一个大小正好合适的更大缓冲区并返回扩展后的字符串。它不执行除法或模运算。它甚至还配备了一名测试驾驶员。如果使用 gcc,则使用 -Wall -Wno-括号是安全的。

#include <stddef.h>
#include <stdlib.h>
#include <string.h>

static char *expand_tabs(const char *s) {
  int i, j, extra_space;
  char *r, *result = NULL;

  for(i = 0; i < 2; ++i) {
    for(j = extra_space = 0; s[j]; ++j) {
      if (s[j] == '\t') {
        int es0 = 8 - (j + extra_space & 7);
        if (result != NULL) {
          strncpy(r, "        ", es0);
          r += es0;
        }
        extra_space += es0 - 1;
      } else if (result != NULL)
        *r++ = s[j];
    }
    if (result == NULL)
      if ((r = result = malloc(j + extra_space + 1)) == NULL)
        return NULL;
  }
  *r = 0;
  return result;
}

#include <stdio.h>

int main(int ac, char **av) {
  char space[1000];
  while (fgets(space, sizeof space, stdin) != NULL) {
    char *s = expand_tabs(space);
    fputs(s, stdout);
    free(s);
  }
  return 0;
}

Here is one that will malloc(3) a bigger buffer of exactly the right size and return the expanded string. It does no division or modulus ops. It even comes with a test driver. Safe with -Wall -Wno-parentheses if using gcc.

#include <stddef.h>
#include <stdlib.h>
#include <string.h>

static char *expand_tabs(const char *s) {
  int i, j, extra_space;
  char *r, *result = NULL;

  for(i = 0; i < 2; ++i) {
    for(j = extra_space = 0; s[j]; ++j) {
      if (s[j] == '\t') {
        int es0 = 8 - (j + extra_space & 7);
        if (result != NULL) {
          strncpy(r, "        ", es0);
          r += es0;
        }
        extra_space += es0 - 1;
      } else if (result != NULL)
        *r++ = s[j];
    }
    if (result == NULL)
      if ((r = result = malloc(j + extra_space + 1)) == NULL)
        return NULL;
  }
  *r = 0;
  return result;
}

#include <stdio.h>

int main(int ac, char **av) {
  char space[1000];
  while (fgets(space, sizeof space, stdin) != NULL) {
    char *s = expand_tabs(space);
    fputs(s, stdout);
    free(s);
  }
  return 0;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文