一次一个字符地计算音节 [C]

发布于 2024-08-20 17:56:08 字数 2410 浏览 2 评论 0原文

我正在编写一个程序，它从文件中读取文本，并确定该文件的句子、单词和音节的数量。诀窍是，它一次只能读取一个字符，并使用该字符。这意味着它不能将整个文件存储在数组中。

因此，考虑到这一点，我的程序的工作原理如下：

while(character != EOF)
{
    check if the character is a end-of-sentence marker (?:;.!)
    check if the character is whitespace (' ' \t \n)
    (must be a letter now)
    check if the letter is a vowel
}

使用状态机方法，每次循环经过时，某些触发器要么为 1 要么为 0，这会影响计数。我在计算句子或单词时没有遇到任何困难，但音节却给我带来了麻烦。我使用的音节定义是任何元音或一组元音都算作 1 个音节，但是单词末尾的单个 e 不算作一个音节。

考虑到这一点，我创建了这样的代码，

if character = 'A' || 'E' ... || 'o' || 'u'
    if the last character wasnt a vowel then
    set the flag for the letter being a vowel.
    (so that next time through, it doesnt get counted)
    and add one to the syllable count.
    if the last character was a vowel, then dont change the flag and don't
    add to the count.

现在我遇到的问题是我对给定文本文件的计数非常低。给定的计数为 57 个音节、36 个单词和 3 个句子。我得到的句子是正确的，与单词相同，但我的音节数只有 35。

我还设置了它，以便当程序读取 !:;.?或空格，它将查看最后读取的字符，如果是 e，它将从音节数中减去一个。这可以解决单词末尾的 e 不被视为元音的情况。

因此，考虑到这一点，我知道我的方法一定有问题，才会产生如此巨大的差异。我一定是忘记了什么。

有人有一些建议吗？我不想包含整个程序，但如果需要，我可以包含某些块。

编辑：一些代码...

我有if（句尾标记），然后是else if（空格），然后是最后的else，它意味着只有可以形成单词的字母才会出现在这个块中。这是唯一应该对音节计数产生影响的代码块...

if(chrctr == 'A' || chrctr == 'E' || chrctr == 'I' || chrctr == 'O' || chrctr == 'U' || chrctr == 'a' || chrctr == 'e' || chrctr == 'i' || chrctr == 'o'  || chrctr == 'u')
        {
            if(chrctr == 'E' || chrctr == 'e')
            {
                isE = 1;
            }
            else
            {
                isE = 0;
            }
            if(skipSylb != 1)
            {
                endSylb = 1;
                skipSylb = 1;
            }
            else
            {
                endSylb = 0;
                skipSylb = 1;
            }
        }
        else
        {
            endSylb = 0;
            skipSylb = 0;

        }

所以解释一下... endSylb 如果为 1，稍后在程序中将在音节计数中加一。 skipSylb 用于标记最后一个字符是否也是音节。如果skipSylb = 1，那么这是一个元音块，我们只想在计数器中添加一个。现在我有一个 isE 变量，它只是告诉程序下次最后一个字母是 E。这意味着，下次通过 while 循环，如果它是句子结尾或空格，并且最后一个字母是 E （所以 isE = 1），那么我们就多加了一个音节。

希望这有帮助。

由于该值实际上低于应有的值，因此我认为也许 i 从计数中减去的语句也很重要。我使用这个 if 语句来决定何时从计数中减去：

 if(isE == 1)
       {
           countSylb --;
       }

当字符是空格或句子结尾字符时，会发生此语句。我想不出其他相关的东西，但我仍然觉得我没有包括足够的内容。哦，好吧，如果有不清楚的地方请告诉我。

原文

I'm writing a program which reads text from a file, and determines the number of sentences, words, and syllables of that file. The trick is, it must only read one character a time, and work with that. Which means it can't just store the whole file in an array.

So, with that in mind, heres how my program works:

while(character != EOF)
{
    check if the character is a end-of-sentence marker (?:;.!)
    check if the character is whitespace (' ' \t \n)
    (must be a letter now)
    check if the letter is a vowel
}

Using a state-machine approach, each time the loop goes through, certain triggers are either 1 or 0, and this effects the count. I have had no trouble counting the sentences or the words, but the syllables are giving my trouble. The definition for syllable that I am using is any vowel or group of vowels counts as 1 syllable, however a single e at the end of a word does not count as a syllable.

With that in mind, I've created code such that

if character = 'A' || 'E' ... || 'o' || 'u'
    if the last character wasnt a vowel then
    set the flag for the letter being a vowel.
    (so that next time through, it doesnt get counted)
    and add one to the syllable count.
    if the last character was a vowel, then dont change the flag and don't
    add to the count.

Now the problem i have, is my count for a given text file, is very low.
The given count is 57 syllables, 36 words, and 3 sentences. I get the sentences correct, same with the words, but my syllable count is only 35.

I also have it setup so that when the program reads a !:;.? or whitespace it will look at the last character read, and if that is an e, it will take one off the syllable count.
This takes care of the e being at the end of a word not counting as a vowel.

So with this in mind, I know there must be something wrong with my methodology to get such a vast difference. I must be forgetting something.

Does anyone have some suggestions? I didn't want to include my entire program, but I can include certain blocks if necessary.

EDIT: Some code...

I have if ( end-of-sentence marker), then else if (whitespace), then the final else which entails that only letters which can form words will be in this block. This is the only block of code which should have any effect on the counting of syllables...

if(chrctr == 'A' || chrctr == 'E' || chrctr == 'I' || chrctr == 'O' || chrctr == 'U' || chrctr == 'a' || chrctr == 'e' || chrctr == 'i' || chrctr == 'o'  || chrctr == 'u')
        {
            if(chrctr == 'E' || chrctr == 'e')
            {
                isE = 1;
            }
            else
            {
                isE = 0;
            }
            if(skipSylb != 1)
            {
                endSylb = 1;
                skipSylb = 1;
            }
            else
            {
                endSylb = 0;
                skipSylb = 1;
            }
        }
        else
        {
            endSylb = 0;
            skipSylb = 0;

        }

So to explain... endSylb if 1, later in the program will add one to the count of syllables. skipSylb is used to flag if the last character was also a syllable. If skipSylb = 1, then this is a block of vowels and we only want to add one to the counter. Now I have an isE variable, which just tells the program next time around that the last letter was an E. This means, next time through the while loop, if it is an end of sentence, or whitespace, and the last letter was E (so isE = 1), then we have added one too many syllables.

Hopefully that helps.

Since the value is actually lower then what it should be, i thought perhaps the statements where i minus from the count are important too.
I use this if statement to decide when to minus from the count:

 if(isE == 1)
       {
           countSylb --;
       }

This statement happens when the character is whitespace, or an end of sentence character.
I can't think of anything else relevant, but i still feel like im not including enough.
Oh well, let me know if something is unclear.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尘曦 2024-08-27 17:56:08

我还设置了它，以便当程序读取 !:;.?或空格，它将查看最后读取的字符，如果是 e，它将从音节计数中扣除一个。

这听起来不对。那么像“死”和“见”这样的词呢？
显然，只有当单词计数超过一个音节时，您才能减少计数。

在您的情况下，如果末尾的“e”不是元音组的一部分，则递减可能就足够了。

如果这没有帮助：也许您在读完辅音后没有清除元音标志？我无法从你的代码中看出。

真正能帮助你的是调试输出。让程序告诉您它在做什么，例如：

“读取元音：e”

“不计算元音 e，因为 [...]”

回复收藏 0 原文

小镇女孩 2024-08-27 17:56:08

您需要一个有限状态机

在某种意义上，每个程序是一个状态机，但通常在编程中，“状态机”指的是一个严格组织的循环，它执行以下操作：

while (1) {
  switch(current_state) {
    case STATE_IDLE:
      if (evaluate some condition)
        next_state = STATE_THIS;
      else
        next_state = STATE_THAT;
      break
    case STATE_THIS:
      // some other logic here
      break;
    case STATE_THAT:
      // yet more
      break;
  }
  state = next_state;
}

是的，您可以使用 通用意大利面条代码。尽管带有文字跳转的遗留意大利面条代码不再出现，但有一种思想流派拒绝将大量条件和嵌套条件分组在单个函数中，以最大限度地减少 圈复杂度。打个比方，条件语句的大老鼠巢有点像现代版的意大利面条代码。

通过至少将控制流组织到状态机中，您可以将一些逻辑压缩到单个平面中，并且更容易可视化操作并进行单独的更改。创建的结构虽然很少是最短的表达式，但至少易于修改和增量更改。

You need a Finite State Machine

In a sense, every program is a state machine, but typically in the programming racket by "state machine" we mean a strictly organized loop that does something like:

while (1) {
  switch(current_state) {
    case STATE_IDLE:
      if (evaluate some condition)
        next_state = STATE_THIS;
      else
        next_state = STATE_THAT;
      break
    case STATE_THIS:
      // some other logic here
      break;
    case STATE_THAT:
      // yet more
      break;
  }
  state = next_state;
}

Yes, you can solve this kind of program with general spaghetti code. Although legacy spaghetti code with literal jumps isn't seen any more, there is a school of thought which resists grouping lots and lots of conditionals and nested conditionals in a single function, in order to minimize cyclomatic complexity. To mix metaphors, a big rat's-nest of conditionals is kind of the modern version of spaghetti code.

By at least organizing the control flow into a state machine you compress some of the logic into a single plane and it becomes much easier to visualize the operations and make individual changes. A structure is created that, while rarely the shortest possible expression, is at least easy to modify and incrementally alter.

回复收藏 0 原文

凉宸 2024-08-27 17:56:08

看看你的代码，我怀疑一些逻辑在过大的大小中丢失了。您的主要片段看起来相当于这样：

chrctr = tolower(chrctr);

if (strchr(chrctr, "aeiou")) {
    isE = (chrctr == 'e');
    endSylb = !skipSylb;
    skipSylb = 1; // May not be you want, but it's what you have.
}
else {
    skipSylb = endSylb = 0;
}

就我个人而言，我认为尝试通过算法计算音节几乎是没有希望的，但是如果您真的想这样做，我会看看波特词干分析器中的步骤有关如何以半有意义的方式分解英语单词的一些指导。它的目的是去掉后缀，但我怀疑正在解决的问题足够相似，它至少可以提供一些灵感。

Looking at your code, I suspect some of the logic has gotten lost in the excessive size. Your main snippet appears equivalent to something like this:

chrctr = tolower(chrctr);

if (strchr(chrctr, "aeiou")) {
    isE = (chrctr == 'e');
    endSylb = !skipSylb;
    skipSylb = 1; // May not be you want, but it's what you have.
}
else {
    skipSylb = endSylb = 0;
}

Personally, I think trying to count syllables algorithmically is nearly hopeless, but if you really want to, I'd take a look at the steps in the Porter stemmer for some guidance about how to break up English words in a semi-meaningful way. It's intended to strip off suffixes, but I suspect the problems being solved are similar enough that it might provide at least a little inspiration.

回复收藏 0 原文

~没有更多了~