getchar() 以及 C 语言中的句子和单词计数

发布于 2024-08-21 03:29:52 字数 478 浏览 12 评论 0原文

我正在创建一个程序,它遵循一定的规则来计算给定文本文件中的单词、音节和句子。

句子是由空格分隔且以 结尾的单词的集合。或者 !或者 ? 然而,这也是一句话:

Greetings, earthlings..

我处理这个程序的方式是使用 getchar() 一次一个字符地扫描文本文件。我被禁止使用内存中的整个文本文件,它必须一次是一个字符或单词。

这是我的困境:使用 getchar() 我可以找出当前字符是什么。我只是继续在循环中使用 getchar() 直到找到 EOF 字符。但是,如果句子末尾有多个句号,它仍然是一个句子。这意味着我需要知道我正在分析的字符之前和之后的最后一个字符是什么。根据我的想法,这意味着另一个 getchar() 调用,但是当我扫描下一个字符(它现在跳过了一个字符)时,这会产生问题。

有人建议我如何确定上面的句子确实是一个句子吗?

谢谢,如果您需要澄清或其他任何信息,请告诉我。

I'm creating a program which follows certain rules to result in a count of the words, syllables, and sentences in a given text file.

A sentence is a collection of words separated by whitespace that ends in a . or ! or ?
However, this is also a sentence:

Greetings, earthlings..

The way I've approached this program is to scan through the text file one character at a time using getchar(). I am prohibited from working with the the entire text file in memory, it must be one character or word at a time.

Here is my dilemma: using getchar() i can find out what the current character is. I just keep using getchar() in a loop until it finds the EOF character. But, if the sentence has multiple periods at the end, it is still a single sentence. Which means I need to know what the last character was before the one I'm analyzing, and the one after it. Through my thinking, this would mean another getchar() call, but that would create problems when i go to scan in the next character (its now skipped a character).

Does anyone have a suggestion as to how i could determine that the above sentence, is indeed a sentence?

Thanks, and if you need clarification or anything else, let me know.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

小嗷兮 2024-08-28 03:29:53

您只需要实现一个非常简单的状态机。一旦找到句子的结尾,您将保持该状态,直到找到新句子的开头(通常这将是非空白字符,而不是终止符,例如 . ! 或 ?)。

You just need to implement a very simple state machine. Once you've found the end of a sentence you remain in that state until you find the start of a new sentence (normally this would be a non-white space character other than a terminator such as . ! or ?).

亣腦蒛氧 2024-08-28 03:29:53

你需要一个可扩展的语法。例如,查看正则表达式并尝试构建一个。

一般来说,人类语言是多种多样的,并且不容易解析,特别是当您需要分析口语或不同语言时。在某些语言中,甚至可能不清楚单词和句子之间的区别

You need an extensible grammar. Look for example at regular expressions and try to build one.

Generally human language is diverse and not easily parseable especially if you have colloquial speech to analyze or different languages. In some languages it may not even be clear what the distinction between a word and a sentence is.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文