在逗号之间删除白色字符，而不是在逗号内部的内容之间

发布于 2025-01-27 07:26:16 字数 909 浏览 4 评论 0原文

我是C的新手，学习C90。我正在尝试将字符串解析到命令中，但是我很难删除白色字符。

我的目标是解析这样的字符串：

NA ME, NAME   , 123 456, 124   , 14134, 134. 134   ,   1

在此：

NA ME,NAME,123 456,124,14134,134. 134,1

因此，参数内的白色炭仍然存在，但其他白色炭被删除。

我考虑过使用strtok，但是即使有多个连续的逗号，我仍然想保留逗号。

到目前为止，我使用了：

void removeWhiteChars(char *s)
{
    int i = 0;
    int count = 0;
    int inNum = 0;
    while (s[i])
    {
        if (isdigit(s[i]))
        {
            inNum = 1;
        }
        if (s[i] == ',')
        {
            inNum = 0;
        }
        if (!isspace(s[i]) && !inNum)
            s[count++] = s[i];
        else if (inNum)
        {
            s[count++] = s[i];
        }

        ++i;
    }
    s[count] = '\0'; /* adding NULL-terminate to the string */
}

但是它仅跳过数字，并且在逗号之后才在数字之后删除白色字符，这是完全错误的。

我很感谢任何帮助，我已经陷入困境了两天。

原文

I'm new to C and learning C90.
I'm trying to parse a string into a command, But I have a hard time trying to remove white chars.

My goal is to parse a string like this:

NA ME, NAME   , 123 456, 124   , 14134, 134. 134   ,   1

into this:

NA ME,NAME,123 456,124,14134,134. 134,1

so the white chars that were inside the arguments are still there, but the other white chars are removed.

I thought about using strtok, but I still want to keep the commas, even if there are multiple consecutive commas.

Until now I used:

void removeWhiteChars(char *s)
{
    int i = 0;
    int count = 0;
    int inNum = 0;
    while (s[i])
    {
        if (isdigit(s[i]))
        {
            inNum = 1;
        }
        if (s[i] == ',')
        {
            inNum = 0;
        }
        if (!isspace(s[i]) && !inNum)
            s[count++] = s[i];
        else if (inNum)
        {
            s[count++] = s[i];
        }

        ++i;
    }
    s[count] = '\0'; /* adding NULL-terminate to the string */
}

But it only skips for numbers and does not remove white chars after the number until the comma, and it's quite wrong.

i would appreciate any kind of help, I'm stuck on this one for two days now.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅语花开 2025-02-03 07:26:16

每当您遇到可能的可跳动空间时，都需要进行lookaheads。下面的功能，每次看到一个空间时，都会向前检查是否以逗号结尾。同样，对于每个逗号，它都会检查并删除所有以下空间。

// Remove elements str[index] to str[index+len] in place
void splice (char * str, int index, int len) {
  while (str[index+len]) {
    str[index] = str[index+len];
    index++;
  }
  str[index] = 0;
}

void removeWhiteChars (char * str) {
  int index=0, seq_len;

  while (str[index]) {
    if (str[index] == ' ') {
      seq_len = 0;

      while (str[index+seq_len] == ' ') seq_len++;

      if (str[index+seq_len] == ',') {
        splice(str, index, seq_len);
      }
    }
    if (str[index] == ',') {
      seq_len = 0;
      while (str[index+seq_len+1] == ' ') seq_len++;

      if (seq_len) {
        splice(str, index+1, seq_len);
      }
    }
    index++;
  }
}

You need to do lookaheads whenever you encounter possible skippable whitespace. The function below, every time it sees a space, checks forward if it ends with a comma. Likewise, for every comma, it checks and removes all following spaces.

// Remove elements str[index] to str[index+len] in place
void splice (char * str, int index, int len) {
  while (str[index+len]) {
    str[index] = str[index+len];
    index++;
  }
  str[index] = 0;
}

void removeWhiteChars (char * str) {
  int index=0, seq_len;

  while (str[index]) {
    if (str[index] == ' ') {
      seq_len = 0;

      while (str[index+seq_len] == ' ') seq_len++;

      if (str[index+seq_len] == ',') {
        splice(str, index, seq_len);
      }
    }
    if (str[index] == ',') {
      seq_len = 0;
      while (str[index+seq_len+1] == ' ') seq_len++;

      if (seq_len) {
        splice(str, index+1, seq_len);
      }
    }
    index++;
  }
}

回复收藏 0 原文

青柠芒果 2025-02-03 07:26:16

解决任何解析问题的一种简短而可靠的方法是使用 state-loop ，这无非是对原始字符串中所有字符的循环，其中您使用一个（或更多）标志变量跟踪您需要跟踪的任何事物的状态。在您的情况下，您需要知道您是否正在阅读帖子（之后）的状态。

这控制了您如何处理下一个字符。您将使用一个简单的计数器变量来跟踪所读的空间数量，当您遇到下一个字符时，如果您不是comma后，则将该数量的空间附加到新字符串中。如果您是后司，则丢弃缓冲空间。（您可以将遇到'，'''本身用作不需要保存在变量中的标志）。

要删除'，''定界符周围的空格，您可以编写rmdelimws（）函数，该功能将新的字符串填充和旧字符串从AS AS AS AS AS AS CRAMENTS复制并进行操作类似的内容：（

void rmdelimws (char *newstr, const char *old)
{
  size_t spcount = 0;               /* space count */
  int postcomma = 0;                /* post comma flag */
  
  while (*old) {                    /* loop each char in old */
    if (isspace (*old)) {           /* if space? */
      spcount += 1;                 /* increment space count */
    }
    else if (*old == ',') {         /* if comma? */
      *newstr++ = ',';              /* write to new string */
      spcount = 0;                  /* reset space count */
      postcomma = 1;                /* set post comma flag true */
    }
    else {                          /* normal char? */
      if (!postcomma) {             /* if not 1st char after comma */
        while (spcount--) {         /* append spcount spaces to newstr */
          *newstr++ = ' ';
        }
      }
      spcount = postcomma = 0;      /* reset spcount and postcomma */
      *newstr++ = *old;             /* copy char from old to newstr */
    }
    old++;                          /* increment pointer */
  }
  *newstr = 0;                      /* nul-terminate newstr */
}

注：

如果您想将落后的空间保存在行中（例如，在示例中结束1之后的空格），您可以在nul终止上面的字符串之前添加以下内容：

  if (!postcomma) {                 /* if tailing whitespace wanted */
    while (spcount--) {             /* append spcount spaces to newstr */
      *newstr++ = ' ';
    }
  }

将其放在一起是一个简短的示例，您将拥有：

#include <stdio.h>
#include <ctype.h>

void rmdelimws (char *newstr, const char *old)
{
  size_t spcount = 0;               /* space count */
  int postcomma = 0;                /* post comma flag */
  
  while (*old) {                    /* loop each char in old */
    if (isspace (*old)) {           /* if space? */
      spcount += 1;                 /* increment space count */
    }
    else if (*old == ',') {         /* if comma? */
      *newstr++ = ',';              /* write to new string */
      spcount = 0;                  /* reset space count */
      postcomma = 1;                /* set post comma flag true */
    }
    else {                          /* normal char? */
      if (!postcomma) {             /* if not 1st char after comma */
        while (spcount--) {         /* append spcount spaces to newstr */
          *newstr++ = ' ';
        }
      }
      spcount = postcomma = 0;      /* reset spcount and postcomma */
      *newstr++ = *old;             /* copy char from old to newstr */
    }
    old++;                          /* increment pointer */
  }
  *newstr = 0;                      /* nul-terminate newstr */
}


int main (void) {
  
  char str[] = "NA ME, NAME   , 123 456, 124   , 14134, 134. 134   ,   1   ",
       newstr[sizeof str] = "";
  
  rmdelimws (newstr, str);
  
  printf ("\"%s\"\n\"%s\"\n", str, newstr);
}

示例使用/输出

$ ./bin/rmdelimws
"NA ME, NAME   , 123 456, 124   , 14134, 134. 134   ,   1   "
"NA ME,NAME,123 456,124,14134,134. 134,1"

A short and reliable way to approach any parsing problem is to use a state-loop which is nothing more than a loop over all the characters in your original string where you use one (or more) flag variables to keep track of the state of anything you need to track. In your case here, you need to know the state of whether you are reading post (after) the comma.

This controls how you handle the next character. You will use a simple counter variable to keep track of the number of spaces you have read, and when you encounter the next character, if you are not post-comma, you append that number of spaces to your new string. If you are post-comma, you discard the buffered spaces. (you can use encountering the ',' itself as a flag that need not be kept in a variable).

To remove spaces around the ',' delimiter, you can write a rmdelimws() function that takes the new string to fill and the old string to copy from as arguments and do something similar to:

void rmdelimws (char *newstr, const char *old)
{
  size_t spcount = 0;               /* space count */
  int postcomma = 0;                /* post comma flag */
  
  while (*old) {                    /* loop each char in old */
    if (isspace (*old)) {           /* if space? */
      spcount += 1;                 /* increment space count */
    }
    else if (*old == ',') {         /* if comma? */
      *newstr++ = ',';              /* write to new string */
      spcount = 0;                  /* reset space count */
      postcomma = 1;                /* set post comma flag true */
    }
    else {                          /* normal char? */
      if (!postcomma) {             /* if not 1st char after comma */
        while (spcount--) {         /* append spcount spaces to newstr */
          *newstr++ = ' ';
        }
      }
      spcount = postcomma = 0;      /* reset spcount and postcomma */
      *newstr++ = *old;             /* copy char from old to newstr */
    }
    old++;                          /* increment pointer */
  }
  *newstr = 0;                      /* nul-terminate newstr */
}

(note: updated to affirmatively nul-terminate if newstr wasn't initialized all zero as shown below)

If you want to save the trailing whitespace in the line (e.g. spaces after the ending 1 in your example), you can add the following before nul-terminating the string above:

  if (!postcomma) {                 /* if tailing whitespace wanted */
    while (spcount--) {             /* append spcount spaces to newstr */
      *newstr++ = ' ';
    }
  }

Putting it together is a short example you would have:

#include <stdio.h>
#include <ctype.h>

void rmdelimws (char *newstr, const char *old)
{
  size_t spcount = 0;               /* space count */
  int postcomma = 0;                /* post comma flag */
  
  while (*old) {                    /* loop each char in old */
    if (isspace (*old)) {           /* if space? */
      spcount += 1;                 /* increment space count */
    }
    else if (*old == ',') {         /* if comma? */
      *newstr++ = ',';              /* write to new string */
      spcount = 0;                  /* reset space count */
      postcomma = 1;                /* set post comma flag true */
    }
    else {                          /* normal char? */
      if (!postcomma) {             /* if not 1st char after comma */
        while (spcount--) {         /* append spcount spaces to newstr */
          *newstr++ = ' ';
        }
      }
      spcount = postcomma = 0;      /* reset spcount and postcomma */
      *newstr++ = *old;             /* copy char from old to newstr */
    }
    old++;                          /* increment pointer */
  }
  *newstr = 0;                      /* nul-terminate newstr */
}


int main (void) {
  
  char str[] = "NA ME, NAME   , 123 456, 124   , 14134, 134. 134   ,   1   ",
       newstr[sizeof str] = "";
  
  rmdelimws (newstr, str);
  
  printf ("\"%s\"\n\"%s\"\n", str, newstr);
}

Example Use/Output

$ ./bin/rmdelimws
"NA ME, NAME   , 123 456, 124   , 14134, 134. 134   ,   1   "
"NA ME,NAME,123 456,124,14134,134. 134,1"

回复收藏 0 原文

小霸王臭丫头 2025-02-03 07:26:16

下面的作品，至少对于输入字符串。我绝对没有关于其效率或优雅的主张。我没有尝试修改s，而是写入新字符串。我遵循的算法是：

初始化a startpos to 0。
循环s，直到找到逗号为止。
从该位置备份，直到找到第一个非空间字符。
memcpy从startpos到该位置到新字符串。
将逗号添加到新字符串的下一个位置。
从逗号位置期待，直到找到第一个非空间字符，将其设置为startpos。
重复
冲洗并在最后

void removeWhiteChars(char *s)
{
    size_t i = 0;
    size_t len = strlen(s);
    char* newS = calloc(1, len);
    size_t newSIndex = 0;
    size_t startPos = 0;

    while (i<len)
    {
        // find the comma
        if (s[i] == ',')
        {            
            // find the first nonspace char before the comma
            ssize_t before = i-1;
            while (isspace(s[before]))
            {
                before--;
            }
            
            // copy from startPos to before into our new string
            size_t amountToCopy = (before-startPos)+1;
            memcpy(newS+newSIndex, s+startPos, amountToCopy);
            newSIndex += amountToCopy;
            newS[newSIndex++] = ',';

            // update startPos
            startPos = i+1;
            while (isspace(s[startPos]))
            {
                startPos++;
            }
            
            // update i
            i = startPos+1;
        }
        else
        {
            i++;
        }
    }

    // finally tack on the end
    strcat(newS, s+startPos);

    // You can return newS if you're allowed to change your function
    // signature, or strcpy it to s
    printf("%s\n", newS);    
}

，用strcat附加最终令牌，我还只用您的输入字符串对其进行测试，它可能会破坏其他情况。

示范

Below works, at least for your input string. I make absolutely no claims as to its efficiency or elegance. I did not try to modify s in place, instead wrote to a new string. The algorithm I followed was:

Initialized a startPos to 0.
Loop on s until you find a comma.
Backup from that position until you find the first non-space character.
memcpy from startPos to that position to a new string.
Add a comma to the next position of the new string.
Look forward from comma position until you find the first non-space character, set that to startPos.
Rinse and repeat
At the very end, append the final token with strcat

void removeWhiteChars(char *s)
{
    size_t i = 0;
    size_t len = strlen(s);
    char* newS = calloc(1, len);
    size_t newSIndex = 0;
    size_t startPos = 0;

    while (i<len)
    {
        // find the comma
        if (s[i] == ',')
        {            
            // find the first nonspace char before the comma
            ssize_t before = i-1;
            while (isspace(s[before]))
            {
                before--;
            }
            
            // copy from startPos to before into our new string
            size_t amountToCopy = (before-startPos)+1;
            memcpy(newS+newSIndex, s+startPos, amountToCopy);
            newSIndex += amountToCopy;
            newS[newSIndex++] = ',';

            // update startPos
            startPos = i+1;
            while (isspace(s[startPos]))
            {
                startPos++;
            }
            
            // update i
            i = startPos+1;
        }
        else
        {
            i++;
        }
    }

    // finally tack on the end
    strcat(newS, s+startPos);

    // You can return newS if you're allowed to change your function
    // signature, or strcpy it to s
    printf("%s\n", newS);    
}

I have also only tested it with your input string, it may break for other cases.

Demonstration

回复收藏 0 原文

睫毛溺水了 2025-02-03 07:26:16

您可以使用状态计算机在o（n）中对其进行修改。在此示例中，我使用 re2c 进行设置并为我保留状态。

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

static void lex(char *cursor) {
    char *out = cursor, *open = cursor, *close = 0;
start:
    /*!re2c /* Use "re2c parse.re.c -o parse.c" to get C output file. */
    re2c:define:YYCTYPE = "char";
    re2c:define:YYCURSOR = "cursor";
    re2c:yyfill:enable = 0;
    /* Whitespace. */
    [ \f\n\r\t\v]+ { if(!close) open = cursor; goto start; }
    /* Words. */
    [^, \f\n\r\t\v\x00]+ { close = cursor; goto start; }
    /* Comma: write [open, close) and reset. */
    "," {
        if(close)
            memmove(out, open, close - open), out += close - open, close = 0;
        *(out++) = ',';
        open = cursor;
        goto start;
    }
    /* End of string: write any [open, close). */
    "\x00" {
        if(close)
            memmove(out, open, close - open), out += close - open;
        *(out++) = '\0';
        return;
    }
    */
}

int main(void) {
    char command[]
        = "NA ME, NAME   , 123 456, 124   , 14134, 134. 134   ,   1   ";
    printf("<%s>\n", command);
    lex(command);
    printf("<%s>\n", command);
    return EXIT_SUCCESS;
}

这是通过懒惰来起作用的。也就是说，在逗号或字符串的末端，我们可以确定字符串的写作，直到我们确定它已完成。这很简单，属于常规语言，没有lookahead。它保留了它们之间没有逗号的单词之间的空格。它还覆盖字符串，因此不使用额外的空间；我们可以这样做，因为编辑仅涉及删除。

You can modify this in place in O(n) using a state machine. In this example, I've used re2c to set-up and keep the state for me.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

static void lex(char *cursor) {
    char *out = cursor, *open = cursor, *close = 0;
start:
    /*!re2c /* Use "re2c parse.re.c -o parse.c" to get C output file. */
    re2c:define:YYCTYPE = "char";
    re2c:define:YYCURSOR = "cursor";
    re2c:yyfill:enable = 0;
    /* Whitespace. */
    [ \f\n\r\t\v]+ { if(!close) open = cursor; goto start; }
    /* Words. */
    [^, \f\n\r\t\v\x00]+ { close = cursor; goto start; }
    /* Comma: write [open, close) and reset. */
    "," {
        if(close)
            memmove(out, open, close - open), out += close - open, close = 0;
        *(out++) = ',';
        open = cursor;
        goto start;
    }
    /* End of string: write any [open, close). */
    "\x00" {
        if(close)
            memmove(out, open, close - open), out += close - open;
        *(out++) = '\0';
        return;
    }
    */
}

int main(void) {
    char command[]
        = "NA ME, NAME   , 123 456, 124   , 14134, 134. 134   ,   1   ";
    printf("<%s>\n", command);
    lex(command);
    printf("<%s>\n", command);
    return EXIT_SUCCESS;
}

This works by being lazy; that is, differing the writing of the string until we can be sure it's complete, either at a comma or the end of the string. It's quite simple, belonging to a regular language, without lookahead. It preserves whitespace between words that don't have commas between them. It also overwrites the string, so it doesn't use extra space; we can do this because the edits only involve deletion.

回复收藏 0 原文

送君千里 2025-02-03 07:26:16

请尝试以下操作：

void removeWhiteChars(char *s)
{
    int i = 0;
    int count = 0;
    int isSomething = 0;
    while (s[i])
    {
        if (s[i] == ',' && isSomething == 0)
            isSomething = 2;
        else if (s[i] == ',' && isSomething == 1)
            isSomething = 2;
        else if (s[i] == ',' && isSomething == 2)
        {
            s[count++] = ',';
            s[count++] = s[i];
            isSomething = 0;
        }
        else if (isspace(s[i]) && isSomething == 0)
            isSomething = 1;
        else if (isspace(s[i]) && isSomething == 1)
            isSomething = 1;
        else if (isspace(s[i]) && isSomething == 2)
            isSomething = 2;
        else if (isSomething == 1)
        {
            s[count++] = ' ';
            s[count++] = s[i];
            isSomething = 0;
        }
        else if (isSomething == 2)
        {
            s[count++] = ',';
            s[count++] = s[i];
            isSomething = 0;
        }
        else
            s[count++] = s[i];

        ++i;
    }
    s[count] = '\0'; /* adding NULL-terminate to the string */
}

Please try this:

void removeWhiteChars(char *s)
{
    int i = 0;
    int count = 0;
    int isSomething = 0;
    while (s[i])
    {
        if (s[i] == ',' && isSomething == 0)
            isSomething = 2;
        else if (s[i] == ',' && isSomething == 1)
            isSomething = 2;
        else if (s[i] == ',' && isSomething == 2)
        {
            s[count++] = ',';
            s[count++] = s[i];
            isSomething = 0;
        }
        else if (isspace(s[i]) && isSomething == 0)
            isSomething = 1;
        else if (isspace(s[i]) && isSomething == 1)
            isSomething = 1;
        else if (isspace(s[i]) && isSomething == 2)
            isSomething = 2;
        else if (isSomething == 1)
        {
            s[count++] = ' ';
            s[count++] = s[i];
            isSomething = 0;
        }
        else if (isSomething == 2)
        {
            s[count++] = ',';
            s[count++] = s[i];
            isSomething = 0;
        }
        else
            s[count++] = s[i];

        ++i;
    }
    s[count] = '\0'; /* adding NULL-terminate to the string */
}

回复收藏 0 原文

甜心 2025-02-03 07:26:16

这是一种可能的算法。此处介绍的不一定是优化的，但存在于证明算法的一种可能实现。这是故意的部分抽象的。

以下是一种非常强大的O（n）时间算法，您可以用来修剪空格（如果您概括并扩展它）。

尚未确认此实现的工作原理，但是

。 ''} 或{char_in_alphabet，''}，您开始链，一个代表当前执行路径的值。当您看到任何其他字符时，如果检测到第二个序列，则该链应破裂，反之亦然。我们将定义一个函数：

// const char *const in: indicates intent to read from in only
void trim_whitespace(const char *const in, char *out, uint64_t const out_length);

我们定义了一个已知所有执行路径的确定算法，因此对于每个可能的执行状态，您应该使用在函数中定义的可读性中定义的枚举来分配一个数字值，从零开始线性地增加。，然后切换语句（除非goto和标记更好地模拟算法的行为）：

void trim_whitespace(const char *const in, char *out, uint64_t const out_length) {
    // better to use ifdefs first or avoid altogether with auto const variable,
    // but you get the point here without all that boilerplate
    #define CHAR_NULL 0

    enum {
        DEFAULT = 0,
        WHITESPACE_CHAIN
    } execution_state = DEFAULT;
    
    // track if loop is executing; makes the logic more readable;
    // can also detect environment instability
    // volatile: don't want this to be optimized out of existence
    volatile bool executing = true;

    while(executing) {
        switch(execution_state) {
        case DEFAULT:
            ...
        case WHITESPACE_CHAIN:
            ...
        default:
            ...
        }
    }

    function_exit:
        return;

    // don't forget to undefine once finished so another function can use
    // the same macro name!
    #undef CHAR_NULL
}

可能的执行状态的数量等于2 ** ceil（log_2（n））其中n < /代码>是与当前算法的操作相关的实际执行状态数。您应该在Switch语句中明确命名并为它们命名。

在默认案例中，我们只检查逗号和“法律”字符。如果以前的字符是逗号或法律角色，并且当前字符是一个空间，那么我们希望将状态设置为whitespace_chain。

在whitespace_chain案例中，我们根据我们开始的字符是逗号还是法律角色来测试当前链是否可以修剪当前链。如果可以修剪当前字符，则简单地跳过，我们转到下一个迭代，直到我们根据所需的内容达到另一个逗号或法律字符，然后将执行状态设置为。如果我们确定该链是不可限制的，则将所有跳过的所有字符添加并将执行状态设置回默认。

循环应该看起来像这样：

...
// black boxing subjectives for portability, maintenance, and readability
bool is_whitespace(char);
bool is_comma(char);
// true if the character is allowed in the current context
bool is_legal_char(char);
...

volatile bool executing = true;

// previous character (only updated at loop start, line #LL)
char previous = CHAR_NULL;
// current character (only updated at loop start, line #LL)
char current = CHAR_NULL;
// writes to out if true at end of current iteration; doesn't write otherwise
bool write = false;
// COMMA: the start was a comma/delimeter
// CHAR_IN_ALPHABET: the start was a character in the current context's input alphabet
enum { COMMA=0, CHAR_IN_ALPHABET } comma_or_char = COMMA;

// current character index (only updated at loop end, line #LL)
uint64_t i = 0, j = 0;

while(executing) {
    previous = current;
    current = in[i];

    if (!current) {
        executing = false;
        break;
    }

    switch(execution_state) {
        case DEFAULT:
            if (is_comma(previous) && is_whitespace(current)) {
                execution_state = WHITESPACE_CHAIN;
                write = false;
                comma_or_char = COMMA;
            } else if (is_whitespace(current) && is_legal_char(previous)) { // whitespace check first for short circuiting
                execution_state = WHITESPACE_CHAIN;
                write = false;
                comma_or_char = CHAR_IN_ALPHABET;
            }
            
            break;

        case WHITESPACE_CHAIN:
            switch(comma_or_char) {
                case COMMA:
                    if (is_whitespace(previous) && is_legal_char(current)) {
                        execution_state = DEFAULT;
                        write = true;
                    } else if (is_whitespace(previous) && is_comma(current)) {
                        execution_state = DEFAULT;
                        write = true;
                    } else {
                        // illegal condition: logic error, unstable environment, or SEU
                        executing = true;
                        out = NULL;
                        goto function_exit;
                    }

                    break;

                case CHAR_IN_ALPHABET:
                    if (is_whitespace(previous) && is_comma(current) {
                        execution_state = DEFAULT;
                        write = true;
                    } else if (is_whitespace(previous) && is_legal_char(current)) {
                        // abort: within valid input string/token
                        execution_state = DEFAULT;
                        write = true;
                        // make sure to write all the elements we skipped; 
                        // function should update the value of j when finished
                        write_skipped(in, out, &i, &j);
                    } else {
                        // illegal condition: logic error, unstable environment, or SEU
                        executing = true;
                        out = NULL;
                        goto function_exit;
                    }

                    break;

                default:
                    // impossible condition: unstable environment or SEU
                    executing = true;
                    out = NULL;
                    goto function_exit;
            }
            
            break;

        default:
            // impossible condition: unstable environment or SEU
            executing = true;
            out = NULL;
            goto function_exit;
    }

    if (write) {
        out[j] = current;
        ++j;
    }

    ++i;
}

if (executing) {
    // memory error: unstable environment or SEU
    out = NULL;
} else {
    // execution successful
    goto function_exit;
}

// end of function

请同时使用whitespace一词来形容这些字符，因为它们通常被称为“白色字符”。

Here is one possible algorithm. It is not necessarily well-optimized as presented here, but exists to demonstrate one possible implementation of an algorithm. It is intentionally partially abstract.

The following is a very robust O(n) time algorithm you may use to trim whitespace (among other things if you generalize and extend it).

This implementation has not been verified to work as-is, however.

You should track the previous character and relevant spaces so that if you see { ',', ' ' } or { CHAR_IN_ALPHABET, ' '}, you begin a chain, and a value representing the current path of execution. When you see any other character, the chain should break if the first sequence, and vice versa if the second sequence is detected. We'll be defining a function:

// const char *const in: indicates intent to read from in only
void trim_whitespace(const char *const in, char *out, uint64_t const out_length);

We are defining a definite algorithm in which all execution paths are known, so for each unique possible state of execution, you should assign a numeric value increasing linearly beginning from zero using enums defined within the function for readability, and switch statements (unless goto and labels better models the behavior of the algorithm):

void trim_whitespace(const char *const in, char *out, uint64_t const out_length) {
    // better to use ifdefs first or avoid altogether with auto const variable,
    // but you get the point here without all that boilerplate
    #define CHAR_NULL 0

    enum {
        DEFAULT = 0,
        WHITESPACE_CHAIN
    } execution_state = DEFAULT;
    
    // track if loop is executing; makes the logic more readable;
    // can also detect environment instability
    // volatile: don't want this to be optimized out of existence
    volatile bool executing = true;

    while(executing) {
        switch(execution_state) {
        case DEFAULT:
            ...
        case WHITESPACE_CHAIN:
            ...
        default:
            ...
        }
    }

    function_exit:
        return;

    // don't forget to undefine once finished so another function can use
    // the same macro name!
    #undef CHAR_NULL
}

The number of possible execution states is equal to 2**ceil(log_2(n)) where n is the number of actual execution states relevant to the operation of the current algorithm. You should explicitly name them and make cases for them in the switch statement.

In the DEFAULT case, we're only checking for commas and "legal" characters. If the previous character was a comma or legal character, and the current character is a space, then we want to set the state to WHITESPACE_CHAIN.

In the WHITESPACE_CHAIN case, we test if the current chain can be trimmed based on whether the character we began with was a comma or legal character. If the current character can be trimmed, it is simply skipped and we go to the next iteration until we hit another comma or legal character depending on what we're looking for, then set the execution state to DEFAULT. If we determine this chain to not be trimmable, then we add all the characters we skipped and set the execution state back to DEFAULT.

The loop should look something like this:

...
// black boxing subjectives for portability, maintenance, and readability
bool is_whitespace(char);
bool is_comma(char);
// true if the character is allowed in the current context
bool is_legal_char(char);
...

volatile bool executing = true;

// previous character (only updated at loop start, line #LL)
char previous = CHAR_NULL;
// current character (only updated at loop start, line #LL)
char current = CHAR_NULL;
// writes to out if true at end of current iteration; doesn't write otherwise
bool write = false;
// COMMA: the start was a comma/delimeter
// CHAR_IN_ALPHABET: the start was a character in the current context's input alphabet
enum { COMMA=0, CHAR_IN_ALPHABET } comma_or_char = COMMA;

// current character index (only updated at loop end, line #LL)
uint64_t i = 0, j = 0;

while(executing) {
    previous = current;
    current = in[i];

    if (!current) {
        executing = false;
        break;
    }

    switch(execution_state) {
        case DEFAULT:
            if (is_comma(previous) && is_whitespace(current)) {
                execution_state = WHITESPACE_CHAIN;
                write = false;
                comma_or_char = COMMA;
            } else if (is_whitespace(current) && is_legal_char(previous)) { // whitespace check first for short circuiting
                execution_state = WHITESPACE_CHAIN;
                write = false;
                comma_or_char = CHAR_IN_ALPHABET;
            }
            
            break;

        case WHITESPACE_CHAIN:
            switch(comma_or_char) {
                case COMMA:
                    if (is_whitespace(previous) && is_legal_char(current)) {
                        execution_state = DEFAULT;
                        write = true;
                    } else if (is_whitespace(previous) && is_comma(current)) {
                        execution_state = DEFAULT;
                        write = true;
                    } else {
                        // illegal condition: logic error, unstable environment, or SEU
                        executing = true;
                        out = NULL;
                        goto function_exit;
                    }

                    break;

                case CHAR_IN_ALPHABET:
                    if (is_whitespace(previous) && is_comma(current) {
                        execution_state = DEFAULT;
                        write = true;
                    } else if (is_whitespace(previous) && is_legal_char(current)) {
                        // abort: within valid input string/token
                        execution_state = DEFAULT;
                        write = true;
                        // make sure to write all the elements we skipped; 
                        // function should update the value of j when finished
                        write_skipped(in, out, &i, &j);
                    } else {
                        // illegal condition: logic error, unstable environment, or SEU
                        executing = true;
                        out = NULL;
                        goto function_exit;
                    }

                    break;

                default:
                    // impossible condition: unstable environment or SEU
                    executing = true;
                    out = NULL;
                    goto function_exit;
            }
            
            break;

        default:
            // impossible condition: unstable environment or SEU
            executing = true;
            out = NULL;
            goto function_exit;
    }

    if (write) {
        out[j] = current;
        ++j;
    }

    ++i;
}

if (executing) {
    // memory error: unstable environment or SEU
    out = NULL;
} else {
    // execution successful
    goto function_exit;
}

// end of function

Please kindly also use the word whitespace to describe these characters as that is what they are commonly known as, not "white chars".

回复收藏 0 原文

~没有更多了~