识别文本中的空格

发布于 2024-08-20 13:47:01 字数 634 浏览 7 评论 0原文

我正在编写一个程序,可以破译基本文本文件中给出的句子、音节和单词。

程序逐个字符地循环遍历文件。 它首先检查它是否是某种句尾标记,例如 ! ? : ; 或<代码>.。 然后,如果该字符不是空格或制表符,则假定它是一个字符。 最后,它确定如果它是空格或制表符,并且它之前的最后一个字符是有效的字母/字符(例如不是句子结束标记),则它是一个单词。

我对细节有点了解,但这就是我遇到的问题。 我的字数等于我的句子数。这解释的是,它意识到当有句子结束标记时单词就会停止,但真正的问题是空格被认为是有效字母。

这是我的 if 语句,用于确定所讨论的字符是否是单词中的有效字母:

else if(character != ' ' || character != '\t')

我已经在程序中的该点排除了句尾标记。 (如果实际上是原文)。从 Ascii 表中读取,32 应该是空格字符。 但是,当我输出该代码块中的所有字符时,其中会出现空格。

那么我做错了什么?我怎样才能阻止空间通过这个如果?

提前致谢,我感觉这个问题可能有点含糊,或者措辞不好。如果您有任何疑问或需要澄清,请告诉我。

I'm writing a program that deciphers sentences, syllables, and words given in a basic text file.

The program cycles through the file character by character.
It first looks if it is some kind of end-of-sentence marker, like ! ? : ; or ..
Then if the character is not a space or tab, it assumes it is a character.
Finally, it identifies that if it is a space or tab, and the last character before it was a valid letter/character (e.g. not an end-of-sentence marker), it is a word.

I was a bit light on the details, but here is the problem I have.
My word count is equal to my sentence count. What this interprets to, is it realizes that a word stops when there is an end of sentence marker, BUT the real problem is the spaces are considered valid letters.

Heres my if statement, to decide if the character in question is a valid letter in a word:

else if(character != ' ' || character != '\t')

I've already ruled out end-of-sentence markers by that point in the program. (In the original if actually). From reading off an Ascii table, 32 should be the space character.
However, when i output all of the characters that make it into that block of code, spaces are in there.

So what am I doing wrong? How can i stop spaces from getting through this if?

Thanks in advance, and I have a feeling the question may be a bit vague, or poorly worded. If you have any questions or need clarification, let me know.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

墨洒年华 2024-08-27 13:47:01

您不应该依赖实际的字符数:这取决于您的平台使用的编码,并且可能不是 ASCII。您可以通过简单地测试来检查任何特定字符。例如,要测试 c 是否是空格字符:

if (c == ' ')

可以工作,更易于阅读并且可移植。

如果您想跳过所有空格,则应使用 #include,然后使用 isspace()

if (isspace((unsigned char)c))

编辑 :正如其他人所说,您检查“不是空格”的条件是错误的,但上述观点仍然适用。因此,您的条件可以替换为:

if (!isspace((unsigned char)c))

You should not rely on actual numbers for characters: that depends upon the encoding your platform uses, and may not be ASCII. You can check for any particular character by simply testing against it. For example, to test if c is a space character:

if (c == ' ')

will work, is easier to read, and is portable.

If you want to skip all white-space, you should use #include <ctype.h> and then use isspace():

if (isspace((unsigned char)c))

Edit: As others said, your condition to check for "not a space" is wrong, but the above point still applies. So, your condition can be replaced by:

if (!isspace((unsigned char)c))
把回忆走一遍 2024-08-27 13:47:01

我注意到这

(character != 32 || character != 9)

始终是正确的。因为如果字符是 32,它就不是 9,并且 true OR false 是 true...

您的意思可能是

(character != ' ' && character != '\t')

I note that

(character != 32 || character != 9)

is always true. because if the character is 32 it is not 9, and true OR false is true...

You probably mean

(character != ' ' && character != '\t')
寂寞笑我太脆弱 2024-08-27 13:47:01

最好只与您认为是空白的特定字符进行比较,同时使用 &&:

if ((character != ' ') &&
    (character != '\t'))

It would probably be better to just compare against the specific characters you consider whitespace, also use an &&:

if ((character != ' ') &&
    (character != '\t'))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文