单词长度直方图练习提示?

发布于 2024-11-04 06:45:01 字数 1036 浏览 1 评论 0原文

我正在通过《C 编程语言》一书学习 C,我正在尝试解决练习 1.13:

“编写一个程序来打印输入中单词长度的直方图。很容易 绘制条形水平的直方图;垂直方向更具挑战性。”

我编写了代码,但是当我按 CTRL+Z(文件结束)时,它显示全零而不是单词的长度。

任何人都可以给我提示我在哪里吗出了问题吗?

#include <stdio.h>

/* print a histogram of the length of words from input */
main()
{
    int c, i, wordn, space;
    int lengthn[20];

    wordn = space = 0;
    for (i = 0; i < 20; ++i)
        lengthn[i] = 0;

    while ((c = getchar()) != EOF) {
        if (c == ' ' || c == '\t' || c == '\n')
            if (space == 1) {
                ++wordn;
                space = 0;
                ++i;
            }
        if (c != ' ' && c != '\t' && c != '\n') {
            ++lengthn[i];
            space = 1;
        }
    }
    printf("Length: ");
    for (i = 0; i < 16; ++i)
        printf("%d   ", lengthn[i]);
    printf("\n        --------------------------------------------------------------\n");
    printf("Word:   1   2   3   4   5   6   7   8   9   10   11   12   13   14   15\n");
}

I'm learning C with "The C Programming Language" book, and I'm trying to solve exercise 1.13:

"Write a program to print a histogram of the lengths of words in its input. It is easy to
draw the histogram with the bars horizontal; a vertical orientation is more challenging."

I wrote the code, but when I press CTRL+Z (End-of-File), it shows all zeros instead of the length of words.

Could anyone give me a hint on where I'm going wrong?

#include <stdio.h>

/* print a histogram of the length of words from input */
main()
{
    int c, i, wordn, space;
    int lengthn[20];

    wordn = space = 0;
    for (i = 0; i < 20; ++i)
        lengthn[i] = 0;

    while ((c = getchar()) != EOF) {
        if (c == ' ' || c == '\t' || c == '\n')
            if (space == 1) {
                ++wordn;
                space = 0;
                ++i;
            }
        if (c != ' ' && c != '\t' && c != '\n') {
            ++lengthn[i];
            space = 1;
        }
    }
    printf("Length: ");
    for (i = 0; i < 16; ++i)
        printf("%d   ", lengthn[i]);
    printf("\n        --------------------------------------------------------------\n");
    printf("Word:   1   2   3   4   5   6   7   8   9   10   11   12   13   14   15\n");
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

夜深人未静 2024-11-11 06:45:01

(因为OP要求的是提示,而不是解决方案)

那么......在这个循环之后i等于什么?

for (i = 0; i < 20; ++i)
    lengthn[i] = 0;

接下来你会在哪里使用它?

(Because the OP is asking for hints, not the solution)

So ... what does i equal after this loop?

for (i = 0; i < 20; ++i)
    lengthn[i] = 0;

And where do you use it next?

白昼 2024-11-11 06:45:01
for (i = 0; i < 20; ++i)     
    lengthn[i] = 0;

此循环之后 i 的值将为 i=20

因此您必须在 while 循环之前初始化 i

for (i = 0; i < 20; ++i)     
    lengthn[i] = 0;

The value of i after this loop will be i=20

so you must initialize i before while loop

第七度阳光i 2024-11-11 06:45:01

我写了一个垂直方向的代码。
我是 C 新手,所以可能代码不好。

#include <stdio.h>
#include <conio.h>

#define MAX_WORDS 100
#define IN 1
#define OUT 0

int maxlength(int length[], char num_of_word);

int main()
{
    char c,i,j,state,num_of_word;
    int length[MAX_WORDS];
    /*initialize length[]*/
        for(i=0;i<MAX_WORDS;i++){
        length[i]=0;
        }
    /* find the length of each word */
    num_of_word=0;
    while(num_of_word<MAX_WORDS && (c = getchar()) != EOF && c != 'a'){
        if(c != ' ' && c!= '\t' && c!= '\n'){
            state = IN;
            length[num_of_word]++;
        }else{
            if(state != OUT){
                state = OUT;
                num_of_word++;
            }
        }
    }
    /*   draw histogram            */
    for(i= maxlength(length[],num_of_word);i>0;i--){
        for(j=0;j<num_of_word;j++){
            if(length[j]<i){
                printf(" ");
            }else{
                printf("|");
            }
        }
        printf("\n");
    }
    /* print name of each column*/
    for(i=0;i<num_of_word;i++){
        printf("%d",i+1);
    }

    _getch();
    return(0);
}
/*sub-function that find the longest word */
int maxlength(int length[], char num_of_word){
    int i, max;
    max = length[0];
    for(i=1;i<num_of_word;i++){
        if(max<length[i]){
            max = length[i];
        }
    }
    return max;
}

I wrote a code for the vertical orientation.
I'm new to C, so may be the code is not good.

#include <stdio.h>
#include <conio.h>

#define MAX_WORDS 100
#define IN 1
#define OUT 0

int maxlength(int length[], char num_of_word);

int main()
{
    char c,i,j,state,num_of_word;
    int length[MAX_WORDS];
    /*initialize length[]*/
        for(i=0;i<MAX_WORDS;i++){
        length[i]=0;
        }
    /* find the length of each word */
    num_of_word=0;
    while(num_of_word<MAX_WORDS && (c = getchar()) != EOF && c != 'a'){
        if(c != ' ' && c!= '\t' && c!= '\n'){
            state = IN;
            length[num_of_word]++;
        }else{
            if(state != OUT){
                state = OUT;
                num_of_word++;
            }
        }
    }
    /*   draw histogram            */
    for(i= maxlength(length[],num_of_word);i>0;i--){
        for(j=0;j<num_of_word;j++){
            if(length[j]<i){
                printf(" ");
            }else{
                printf("|");
            }
        }
        printf("\n");
    }
    /* print name of each column*/
    for(i=0;i<num_of_word;i++){
        printf("%d",i+1);
    }

    _getch();
    return(0);
}
/*sub-function that find the longest word */
int maxlength(int length[], char num_of_word){
    int i, max;
    max = length[0];
    for(i=1;i<num_of_word;i++){
        if(max<length[i]){
            max = length[i];
        }
    }
    return max;
}
风和你 2024-11-11 06:45:01

首先,这是原始代码的输出,仅更改一行:

#define MAXLEN 3

100 对于测试来说肯定有点太大了。

t te tes test tests
^Z
  1 -   1  *
  2 -   1  *
  3 -   0
  4 -   0
  5 -   0
  6 -   0
  7 -   0
  8 -   0
  9 -   0
 10 -   0
 11 -   0
 12 -   0
 13 -   0
 14 -   0
 15 -   0
 16 -   0
 17 -   0
 18 -   0
 19 -   0
 20 - -858993460

Overflow: 2

我们看到第一个问题:

  • 数字不匹配:输入 5 个单词,仅占 4 个。
  • 长度为 20 的单词为负值

存在一些 one_off 错误。

下面是 3 个错误的一些提示以及一个完整的替代示例,其中包含源代码、输出和参数。

问题:MAXLEN 长度的单词被忽略

尚不清楚 MAXLEN 是否包含在内,但包含长度为 MAXLEN< 的单词似乎更自然/code> 在直方图中。但无论如何,这些单词都应该被计算在内:在直方图中或在 overflow 计数中:

            if (nc < MAXLEN)
               ++wlen[nc];
           else if (nc > MAXLEN)
               ++overflow;

这里将其排除,因为没有与 ==

change 1

            if (nc <= MAXLEN)
                ++wlen[nc];
            else if (nc > MAXLEN)
                ++overflow;

匹配的内容,我们得到

t te tes test tests
^Z
  1 -   1  *
  2 -   1  *
  3 -   1  *
  4 -   0
...
 20 - -858993460

Overflow: 2

Now我们已经考虑到了 5 个单词。

更改 2

溢出中的巨大数字是由于此处的错误造成的

    for (i = 1; i < MAXHIST + 1; ++i)
    {
        printf("%3d - %3d  ", i, wlen[i]);
        for (j = wlen[i]; j > MINLEN && j < MAXLEN; --j)
        {
            printf("*");
        }
        printf("\n");
    }

,因为在 printf 中索引将一直到 MAXHISTwlen>0MAXHIST-1

有很多方法可以纠正这个问题。这是这里使用的一个简单的方法:只需将 1 项添加到 wlen 数组中。

    int wlen[1+MAXHIST];

并使用

    for (i = 1; i <= MAXHIST; ++i) wlen[i] = 0;

,我们可以获得

t te tes test tests
^Z
  1 -   1  *
  2 -   1  *
  3 -   1  *
  4 -   0
 ...
 20 -   0

Overflow: 2

有关代码的更多信息

所提供的代码基于 1978 年的一本书(K&R 圣经)。从那时起,很多事情都发生了变化。例如,C-Faq 经常被引用作为 C 编程的参考指南,大部分可以追溯到 90 年代。并且自 2005 年以来从未更新过...

但是从那时起,C 语言和编程概念一直在发生变化。很多。

1

    nc = overflow = 0;
    for (i = 0; i < MAXHIST; ++i) wlen[i] = 0;
  • 几十年来,我们可以而且应该在 for 中声明 i。例如,在 C++ 中,我们可以在 switch 或 while 或 if 标头中声明变量。减少任何变量的范围(生命周期)非常重要,特别是那些具有简单名称(例如 ij)的变量。
  • 多重初始化很奇特,但它使得在其中找到变量变得更加困难。

我们可以编写 just

    int nc                = 0;
    int overflow          = 0;
    for (int i = 0; i < MAXHIST; ++i) wlen[i] = 0;

我们也可以编写

    int wlen[1 + MAXHIST] = {0};

和删除这个 for 循环,因为程序只使用数组运行一次​​。

而不是

    int state;
    state = OUT;

您应该使用

    int state = OUT;
  • A 函数应该打印直方图,类似的东西
int print_h(unsigned len, int wlen[], unsigned overflow)
{
  for (unsigned i = 1; i <= len; ++i)
  {
      printf("%3d - %3d  ", i, wlen[i]);
      for (unsigned j = wlen[i]; j > MINLEN && j < MAXLEN;
           --j)
          printf("*");
      printf("\n");
  }
  printf("\nOverflow: %d\n", overflow);
  return 0;
}

就可以了。但是仅用于打印 * 的第二个循环是多余的。

现在,程序经过这些更改后,

#include <stdio.h>

/* Prints a histogram of the lengths of words. */

#define MAXHIST 20
#define MAXLEN 3
#define MINLEN 0
#define IN 1
#define OUT 0

int print_h(unsigned len, int wlen[], unsigned overflow);

int main()
{
    int wlen[1 + MAXHIST] = {0};
    int nc                = 0;
    int overflow          = 0;

    int c     = 0;
    int state = OUT;
    while ((c = getchar()) != EOF)
    {
        if (c == '\t' || c == '\n' || c == ' ')
        {
            state = OUT;
            if (nc <= MAXLEN)
                ++wlen[nc];
            else if (nc > MAXLEN)
                ++overflow;
        }
        else if (state == OUT)
        {
            state = IN;
            nc    = 1;
        }
        else { ++nc; }
    }
    print_h(MAXHIST, wlen, overflow);
    return 0;
}

int print_h(unsigned len, int wlen[], unsigned overflow)
{
    for (unsigned i = 1; i <= len; ++i)
    {
        printf("%3d - %3d  ", i, wlen[i]);
        for (unsigned j = wlen[i]; j > MINLEN && j < MAXLEN;
             --j)
            printf("*");
        printf("\n");
    }
    printf("\nOverflow: %d\n", overflow);
    return 0;
}

示例输出

t te test tests
^Z
  1 -   1  *
  2 -   1  *
  3 -   0
  4 -   0
  5 -   0
  6 -   0
  7 -   0
  8 -   0
  9 -   0
 10 -   0
 11 -   0
 12 -   0
 13 -   0
 14 -   0
 15 -   0
 16 -   0
 17 -   0
 18 -   0
 19 -   0
 20 -   0

Overflow: 2

关于状态机的

  • 和另一个可能的问题

    使用 if 来控制状态,并且首先测试字母流,然后测试状态,很难遵循。一般来说,有限状态机是一些输入(在本例中为字母流)的过滤器,这些输入在循环中馈送,然后开关获取状态并执行其操作.

  • FSM 状态必须在项目(本例中为字母)之前进行测试。这个想法是每个状态都是独立的,并且可以单独编码,甚至可以由不同的人和时间进行编码。

  • 溢出应该是状态:当一个单词达到溢出计数时,所有剩余的字母都会被跳过,这是一种不同的行为,这样编码更容易。

  • EOF 也应该是一种状态:在 EOF 处,我们只是打印结果并终止。

    • 但是在 EOF 时,我们可能会扫描一个单词,并且所提供的代码的工作方式将被忽略。

另一个示例

我不会一步步更改所提供的代码,而是在此处留下完整的 C 示例,示例输出和一些参数,以免帖子更大。

示例中的一些结果


Stack Overflow C>_ p
    maximum word len is 4 (inclusive)
test
^Z
  1 -   0
  2 -   0
  3 -   0
  4 -   1  *
  5 -   0
  6 -   0
  7 -   0
  8 -   0
  9 -   0
 10 -   0
 11 -   0
 12 -   0
 13 -   0
 14 -   0
 15 -   0
 16 -   0
 17 -   0
 18 -   0
 19 -   0
 20 -   0

1 words in histogram [overflow: 0 words]. [All words accounted for].


Stack Overflow C>_ echo t te tes test tests | p
    maximum word len is 4 (inclusive)
  1 -   1  *
  2 -   1  *
  3 -   1  *
  4 -   1  *
  5 -   0
  6 -   0
...
 20 -   0

4 words in histogram [overflow: 1 words]. [All words accounted for].

Stack Overflow C>_ p 5
    maximum word len is 5 (inclusive)
1 12 123 1234
^Z
  1 -   1  *
  2 -   1  *
  3 -   1  *
  4 -   1  *
  5 -   0
  6 -   0
...
 20 -   0

4 words in histogram [overflow: 0 words]. [All words accounted for].

Stack Overflow C>_ echo t | p 5
    maximum word len is 5 (inclusive)
  1 -   1  *
  2 -   0
  3 -   0
 ...
 20 -   0

1 words in histogram [overflow: 0 words]. [All words accounted for].

Stack Overflow C>_ echo t te tes test tests | p 4
    maximum word len is 4 (inclusive)
  1 -   1  *
  2 -   1  *
  3 -   1  *
  4 -   1  *
  5 -   0
...
 20 -   0

4 words in histogram [overflow: 1 words]. [All words accounted for].


Stack Overflow C>_
  • 程序 p 假定默认长度为 4 个字母,但用户可以在命令行上提供值,如 p 12 所示,最多为 12 个字母字母的话。
    • echo t te tes test tests | 这样的行终端上的 p 4 运行程序,使用 4 作为最大长度,并将 echo 和管道 | 符号之间的文本作为输入,因此可以使测试更快而且更容易。
    • 代码会记录已读字数。最后将其与直方图中的单词总和加上溢出计数进行比较,并报告最终的不匹配情况。

的示例的完整代码

/* Prints a histogram of the lengths of words. */
#include <stdio.h>
#include <stdlib.h>

#define MAXHIST 20
#define MAXLEN 4
#define MINLEN 0

// states for the FSM
#define S_OUT 0
#define S_IN 1
#define S_OVERFLOW 2
#define S_EOF 3

// delimiters for wor
#define TAB '\t'
#define NEWLINE '\n'
#define SPACE ' '

int print_h(
    unsigned len, int wlen[], unsigned overflow,
    unsigned n_words);

int main(int argc, char** argv)
{
    unsigned max_l = MAXLEN;  // default
    if (argc > 1) max_l = atoi(argv[1]);
    fprintf(
        stderr, "    maximum word len is %u (inclusive)\n",
        max_l);

    int      wlen[1 + MAXHIST] = {0};
    unsigned nc                = 0;
    unsigned n_words           = 0;
    unsigned overflow          = 0;

    int state = S_OUT;

    int c = getchar();  // read first char
    do {
        switch (state)
        {
            case S_OUT:
                switch (c)
                {
                    case SPACE:
                    case TAB:
                    case NEWLINE:
                        // keep searching
                        c = getchar();  // keep going
                        break;
                    case EOF:
                        state = S_EOF;
                        // just go
                        break;
                    default:
                        // some non-space char
                        state = S_IN;
                        nc    = 1;
                        c     = getchar();  // keep going
                        break;
                };  // switc
                break;

            case S_IN:  // in a word
                switch (c)
                {
                    case SPACE:
                    case TAB:
                    case NEWLINE:  // white-space
                        ++wlen[nc];
                        ++n_words;
                        state = S_OUT;
                        c     = getchar();  // keep going
                        break;
                    case EOF:
                        if (nc <= max_l) ++wlen[nc]; // last word
                        ++n_words;
                        state = S_EOF;
                        break;
                    default:  // another non-space char
                        nc += 1;
                        c = getchar();  // keep going
                        if (nc > max_l)
                        {
                            overflow += 1;
                            ++n_words;
                            state = S_OVERFLOW;
                            break;
                        }
                        break;
                };              // switch
                break;

            case S_OVERFLOW:
                switch (c)
                {
                    case SPACE:
                    case TAB:
                    case NEWLINE:       // white-space
                        state = S_OUT;  // back to search
                        c     = getchar();  // keep going
                        break;
                    case EOF:  // we have a last word to
                               // account for
                        state = S_EOF;
                        break;
                    default:  // another non-space char
                        // do nothing
                        c = getchar();  // keep going
                        break;
                };              // switch
                break;

            case S_EOF:  // print results and return
            default:
                print_h(MAXHIST, wlen, overflow, n_words);
                return 0;
                break;

        };  // switch(state)

    } while (1);  // while

    return 0;  // to make the compiler happy
};             // main





int print_h(
    unsigned len, int wlen[], unsigned overflow,
    unsigned n_words)
{
    long unsigned n_hist = 0;  // words on table
    for (unsigned i = 1; i <= len; ++i)
    {
        printf("%3d - %3d  ", i, wlen[i]);
        n_hist += wlen[i];
        if (wlen[i] > 0)
            printf("*\n");
        else
            printf("\n");
    }
    printf(
        "\n%lu words in histogram [overflow: %d words].",
        n_hist, overflow);
    long unsigned total = n_hist + overflow;
    if (n_words == total)
        printf(" [All words accounted for].\n");
    else
        printf(
            "\
\n\
[ERROR]: %lu words in histogram\n\
         %u words on overflow\n\
         %u words on input.\n",
            n_hist, overflow, n_words);
    return 0;
}

有关示例代码

  • 有 4 种状态:添加了溢出和 eof
  • 接受来自命令行的最大长度
  • 测试无限循环中的状态

作为示例,以下是 S_IN 状态,当我们在一个单词内部时,刚刚读取一个新的 c 值:

           case S_IN:  // in a word
               switch (c)
               {
                   case SPACE:
                   case TAB:
                   case NEWLINE:  // white-space
                       ++wlen[nc];
                       ++n_words;
                       state = S_OUT;
                       c     = getchar();  // keep going
                       break;
                   case EOF:
                       if (nc <= max_l) ++wlen[nc]; // last word
                       ++n_words;
                       state = S_EOF;
                       break;
                   default:  // another non-space char
                       nc += 1;
                       c = getchar();  // keep going
                       if (nc > max_l)
                       {
                           overflow += 1;
                           ++n_words;
                           state = S_OVERFLOW;
                           break;
                       }
                       break;
               };              // switch
               break;

所有状态都使用类似的代码:字母上的开关,因此编写和管理状态的速度更快。上面我们看到了到其他 3 种状态的转换。
S_OVERFLOW 的代码类似:

            case S_OVERFLOW:
                switch (c)
                {
                    case SPACE:
                    case TAB:
                    case NEWLINE:       // white-space
                        state = S_OUT;  // back to search
                        c     = getchar();  // keep going
                        break;
                    case EOF:  // we have a last word to
                               // account for
                        state = S_EOF;
                        break;
                    default:  // another non-space char
                        // do nothing
                        c = getchar();  // keep going
                        break;
                };              // switch
                break;

这就是我想要展示的内容。溢出的逻辑没有隐藏在 S_IN 内部,并且没有测试跳过剩余的字母。 FSM 使抽象模型变得更加容易。

每个状态的代码可以由不同的人开发和测试。

Before anything, here is the output from the original code, changing just one line:

#define MAXLEN 3

100 is for sure a bit too large for testing.

t te tes test tests
^Z
  1 -   1  *
  2 -   1  *
  3 -   0
  4 -   0
  5 -   0
  6 -   0
  7 -   0
  8 -   0
  9 -   0
 10 -   0
 11 -   0
 12 -   0
 13 -   0
 14 -   0
 15 -   0
 16 -   0
 17 -   0
 18 -   0
 19 -   0
 20 - -858993460

Overflow: 2

And we see the first problems:

  • Numbers do not match: 5 words on input, only 4 accounted for.
  • Negative value for words of length 20

There are some one_off errors.

Here follows some hints of 3 of the errors and a complete alternative example, with source code, output and arguments.

PROBLEM: words with MAXLEN length are ignored

It is not clear if MAXLEN is inclusive or not, but it seems to be more natural to include words with length MAXLEN in the histogram. But these words should be counted anyway: in the histogram or in the overflow count:

            if (nc < MAXLEN)
               ++wlen[nc];
           else if (nc > MAXLEN)
               ++overflow;

it is excluded here, since there is no match for ==

change 1

            if (nc <= MAXLEN)
                ++wlen[nc];
            else if (nc > MAXLEN)
                ++overflow;

and we get

t te tes test tests
^Z
  1 -   1  *
  2 -   1  *
  3 -   1  *
  4 -   0
...
 20 - -858993460

Overflow: 2

Now we have the 5 words accounted for.

change 2

The giant number in overflow is due to an error here

    for (i = 1; i < MAXHIST + 1; ++i)
    {
        printf("%3d - %3d  ", i, wlen[i]);
        for (j = wlen[i]; j > MINLEN && j < MAXLEN; --j)
        {
            printf("*");
        }
        printf("\n");
    }

since in the printf the index will go until MAXHIST but wlen goes from 0 to MAXHIST-1.

There are many ways to correct this. This is a simple one used here: just add 1 item to wlen array.

    int wlen[1+MAXHIST];

and use

    for (i = 1; i <= MAXHIST; ++i) wlen[i] = 0;

and we get

t te tes test tests
^Z
  1 -   1  *
  2 -   1  *
  3 -   1  *
  4 -   0
 ...
 20 -   0

Overflow: 2

more about the code

The provided code is based on a book (K&R bible) from 1978. Since then many things changed. The C-Faq for example is often cited as reference guidelines for C programming, and dates back from the most part from the 90's. And was never updated since 2005...

But C language and concepts in programming have been changing since then. A lot.

1

    nc = overflow = 0;
    for (i = 0; i < MAXHIST; ++i) wlen[i] = 0;
  • Since decades we can and should declare i inside the for. In C++ for example we can declare variables in switch or while or if header. It is important to reduce the scope --- lifetime --- of any variable, and specially those with naive names like i and j.
  • multiple initialization is fancy but it makes harder to find variables inside it.

We can write just

    int nc                = 0;
    int overflow          = 0;
    for (int i = 0; i < MAXHIST; ++i) wlen[i] = 0;

Also we can write

    int wlen[1 + MAXHIST] = {0};

and delete this for loop, since the program runs just once with the array.

Instead of

    int state;
    state = OUT;

You should use

    int state = OUT;
  • A function should print the histogram, something like
int print_h(unsigned len, int wlen[], unsigned overflow)
{
  for (unsigned i = 1; i <= len; ++i)
  {
      printf("%3d - %3d  ", i, wlen[i]);
      for (unsigned j = wlen[i]; j > MINLEN && j < MAXLEN;
           --j)
          printf("*");
      printf("\n");
  }
  printf("\nOverflow: %d\n", overflow);
  return 0;
}

will do ok. But the second loop just for printing the * is redundant.

the program now with these changes

#include <stdio.h>

/* Prints a histogram of the lengths of words. */

#define MAXHIST 20
#define MAXLEN 3
#define MINLEN 0
#define IN 1
#define OUT 0

int print_h(unsigned len, int wlen[], unsigned overflow);

int main()
{
    int wlen[1 + MAXHIST] = {0};
    int nc                = 0;
    int overflow          = 0;

    int c     = 0;
    int state = OUT;
    while ((c = getchar()) != EOF)
    {
        if (c == '\t' || c == '\n' || c == ' ')
        {
            state = OUT;
            if (nc <= MAXLEN)
                ++wlen[nc];
            else if (nc > MAXLEN)
                ++overflow;
        }
        else if (state == OUT)
        {
            state = IN;
            nc    = 1;
        }
        else { ++nc; }
    }
    print_h(MAXHIST, wlen, overflow);
    return 0;
}

int print_h(unsigned len, int wlen[], unsigned overflow)
{
    for (unsigned i = 1; i <= len; ++i)
    {
        printf("%3d - %3d  ", i, wlen[i]);
        for (unsigned j = wlen[i]; j > MINLEN && j < MAXLEN;
             --j)
            printf("*");
        printf("\n");
    }
    printf("\nOverflow: %d\n", overflow);
    return 0;
}

sample output

t te test tests
^Z
  1 -   1  *
  2 -   1  *
  3 -   0
  4 -   0
  5 -   0
  6 -   0
  7 -   0
  8 -   0
  9 -   0
 10 -   0
 11 -   0
 12 -   0
 13 -   0
 14 -   0
 15 -   0
 16 -   0
 17 -   0
 18 -   0
 19 -   0
 20 -   0

Overflow: 2

about the state machine and another possible problem

  • using an if to control the states, and testing first the stream of letters and then the states is very hard to follow. In general Finite State Machines are filters of some input (a stream of letters in this case) that are fed in a loop, then a switch gets the state and does its thing.

  • FSM states must be tested before the item (the letter in this case. The idea is that each state is independent and could be coded separately, even by diferent people and time.

  • overflow should be a state: when a word reaches the overflow count then all remaining letters are skipped. This is a different behavior and it is easier to code it this way.

  • EOF should be a state also: at EOF we just print the results and terminate.

    • But at EOF we may have a word being scanned and the way the provided code works it will be ignored.

an alternative example

Instead of changing the provided code step by step I will leave here a complete C example, sample output and some arguments, in order to not have an even larger post.

Some results from the example


Stack Overflow C>_ p
    maximum word len is 4 (inclusive)
test
^Z
  1 -   0
  2 -   0
  3 -   0
  4 -   1  *
  5 -   0
  6 -   0
  7 -   0
  8 -   0
  9 -   0
 10 -   0
 11 -   0
 12 -   0
 13 -   0
 14 -   0
 15 -   0
 16 -   0
 17 -   0
 18 -   0
 19 -   0
 20 -   0

1 words in histogram [overflow: 0 words]. [All words accounted for].


Stack Overflow C>_ echo t te tes test tests | p
    maximum word len is 4 (inclusive)
  1 -   1  *
  2 -   1  *
  3 -   1  *
  4 -   1  *
  5 -   0
  6 -   0
...
 20 -   0

4 words in histogram [overflow: 1 words]. [All words accounted for].

Stack Overflow C>_ p 5
    maximum word len is 5 (inclusive)
1 12 123 1234
^Z
  1 -   1  *
  2 -   1  *
  3 -   1  *
  4 -   1  *
  5 -   0
  6 -   0
...
 20 -   0

4 words in histogram [overflow: 0 words]. [All words accounted for].

Stack Overflow C>_ echo t | p 5
    maximum word len is 5 (inclusive)
  1 -   1  *
  2 -   0
  3 -   0
 ...
 20 -   0

1 words in histogram [overflow: 0 words]. [All words accounted for].

Stack Overflow C>_ echo t te tes test tests | p 4
    maximum word len is 4 (inclusive)
  1 -   1  *
  2 -   1  *
  3 -   1  *
  4 -   1  *
  5 -   0
...
 20 -   0

4 words in histogram [overflow: 1 words]. [All words accounted for].


Stack Overflow C>_
  • The program p assumes a default length of 4 letters, but the user can provide the value on the command line as in p 12 for a maximum of 12 letter words.
    • a line like echo t te tes test tests | p 4 on the terminal runs the program using 4 as the maximum length and the text between echo and the pipe | sign as input, so it makes testing faster and easier.
    • the code keeps a count of words read. At the end compares it to the sum of words in the histogram plus the overflow count, and reports an eventual mismatch.

Complete code for the example

/* Prints a histogram of the lengths of words. */
#include <stdio.h>
#include <stdlib.h>

#define MAXHIST 20
#define MAXLEN 4
#define MINLEN 0

// states for the FSM
#define S_OUT 0
#define S_IN 1
#define S_OVERFLOW 2
#define S_EOF 3

// delimiters for wor
#define TAB '\t'
#define NEWLINE '\n'
#define SPACE ' '

int print_h(
    unsigned len, int wlen[], unsigned overflow,
    unsigned n_words);

int main(int argc, char** argv)
{
    unsigned max_l = MAXLEN;  // default
    if (argc > 1) max_l = atoi(argv[1]);
    fprintf(
        stderr, "    maximum word len is %u (inclusive)\n",
        max_l);

    int      wlen[1 + MAXHIST] = {0};
    unsigned nc                = 0;
    unsigned n_words           = 0;
    unsigned overflow          = 0;

    int state = S_OUT;

    int c = getchar();  // read first char
    do {
        switch (state)
        {
            case S_OUT:
                switch (c)
                {
                    case SPACE:
                    case TAB:
                    case NEWLINE:
                        // keep searching
                        c = getchar();  // keep going
                        break;
                    case EOF:
                        state = S_EOF;
                        // just go
                        break;
                    default:
                        // some non-space char
                        state = S_IN;
                        nc    = 1;
                        c     = getchar();  // keep going
                        break;
                };  // switc
                break;

            case S_IN:  // in a word
                switch (c)
                {
                    case SPACE:
                    case TAB:
                    case NEWLINE:  // white-space
                        ++wlen[nc];
                        ++n_words;
                        state = S_OUT;
                        c     = getchar();  // keep going
                        break;
                    case EOF:
                        if (nc <= max_l) ++wlen[nc]; // last word
                        ++n_words;
                        state = S_EOF;
                        break;
                    default:  // another non-space char
                        nc += 1;
                        c = getchar();  // keep going
                        if (nc > max_l)
                        {
                            overflow += 1;
                            ++n_words;
                            state = S_OVERFLOW;
                            break;
                        }
                        break;
                };              // switch
                break;

            case S_OVERFLOW:
                switch (c)
                {
                    case SPACE:
                    case TAB:
                    case NEWLINE:       // white-space
                        state = S_OUT;  // back to search
                        c     = getchar();  // keep going
                        break;
                    case EOF:  // we have a last word to
                               // account for
                        state = S_EOF;
                        break;
                    default:  // another non-space char
                        // do nothing
                        c = getchar();  // keep going
                        break;
                };              // switch
                break;

            case S_EOF:  // print results and return
            default:
                print_h(MAXHIST, wlen, overflow, n_words);
                return 0;
                break;

        };  // switch(state)

    } while (1);  // while

    return 0;  // to make the compiler happy
};             // main





int print_h(
    unsigned len, int wlen[], unsigned overflow,
    unsigned n_words)
{
    long unsigned n_hist = 0;  // words on table
    for (unsigned i = 1; i <= len; ++i)
    {
        printf("%3d - %3d  ", i, wlen[i]);
        n_hist += wlen[i];
        if (wlen[i] > 0)
            printf("*\n");
        else
            printf("\n");
    }
    printf(
        "\n%lu words in histogram [overflow: %d words].",
        n_hist, overflow);
    long unsigned total = n_hist + overflow;
    if (n_words == total)
        printf(" [All words accounted for].\n");
    else
        printf(
            "\
\n\
[ERROR]: %lu words in histogram\n\
         %u words on overflow\n\
         %u words on input.\n",
            n_hist, overflow, n_words);
    return 0;
}

about the example code

  • there are 4 states: overflow and eof were added
  • accepts a max len from the command line
  • test the states in an infinite loop

As an example, here is code for the S_IN state, when we are inside a word, with a new c value just read:

           case S_IN:  // in a word
               switch (c)
               {
                   case SPACE:
                   case TAB:
                   case NEWLINE:  // white-space
                       ++wlen[nc];
                       ++n_words;
                       state = S_OUT;
                       c     = getchar();  // keep going
                       break;
                   case EOF:
                       if (nc <= max_l) ++wlen[nc]; // last word
                       ++n_words;
                       state = S_EOF;
                       break;
                   default:  // another non-space char
                       nc += 1;
                       c = getchar();  // keep going
                       if (nc > max_l)
                       {
                           overflow += 1;
                           ++n_words;
                           state = S_OVERFLOW;
                           break;
                       }
                       break;
               };              // switch
               break;

All states use similar code: a switch on the letter, so it is faster to write and manage states. Above we see the transitions to the other 3 states.
The code for S_OVERFLOW is similar:

            case S_OVERFLOW:
                switch (c)
                {
                    case SPACE:
                    case TAB:
                    case NEWLINE:       // white-space
                        state = S_OUT;  // back to search
                        c     = getchar();  // keep going
                        break;
                    case EOF:  // we have a last word to
                               // account for
                        state = S_EOF;
                        break;
                    default:  // another non-space char
                        // do nothing
                        c = getchar();  // keep going
                        break;
                };              // switch
                break;

And this what I am trying to show. The logic of overflow is not buried inside S_IN and there is no tests there for skipping remaining letters. A FSM makes it easier to abstract a model.

The code for each state can be developed and tested by different people.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文