C语言中如何检查字符串中的重复单词?
我正在解决 C 中的一个问题,我必须在字符串中找到重复的单词,例如
char a[]="This is it This";
上面的字符串“This”出现两次,所以我想将其算作一个。
有人能建议如何实现这一目标吗?
I am solving a problem in C where i have to find duplicate words in astring like
char a[]="This is it This";
In above string "This" appears two times so I would like to count it as one.
Can anybody suggest how to achieve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一个可以满足您要求的程序。它被硬编码为 4 个字,最多 99 个字符。这很容易改变;我只是根据你的输入来调整它。我还使用了
strcmp
和strcpy
。这两个函数都可以自己实现(将它们称为 mystrcpy 和 mystrcmp 并嵌入它们)。我不会为你重写字符串函数。我确实根据其他答案展示了如何避免 strtok。我查了一下,它们并不复杂,但它们没有向程序添加任何内容,我不想重新发明轮子。最后,我只是在notInArray
函数中使用了简单的线性搜索。对于大型数据集,这效率不高(您可能会使用某种类型的树或哈希)。这是在 gcc 版本 4.3.4 下编译的,
输出如下所示:
Enjoy。
Here is a program that does what you're asking. It is hard coded for 4 words of a max 99 characters. That can be changed easily; I just fit it around your input. I also used
strcmp
andstrcpy
. Both of these functions can be implemented on your own (call them mystrcpy and mystrcmp and embed them). I'm not rewriting the string functions for you. I did show how to avoid strtok based on the other answer. I looked them up and they are not complex, but they did not add anything to the program and I didn't want to reinvent the wheel. Last of all, I just used a simple linear search in thenotInArray
function. For a large data set this is not efficient (you would probably use some type of tree or hash).This was compiled under gcc version 4.3.4
The output looks like:
Enjoy.
我可能会一次读一个单词(例如,使用 sscanf [编辑:刚刚看到你的评论 - 没有字符串函数仍然相当容易 - 只需扫描以查找空格/非空格字符即可找到单词 -烦人但不是主要的)并将它们放入一个数组中(或者,如果你有比上面显示的更多的东西,一个二叉搜索树)。
如果你想要计算每个单词出现的次数,你可以在每个节点中有一个 int (或其他)。如果您只想知道输入中的唯一单词,则不需要计数,只需要单词的集合。
I'd probably read words one at a time (e.g., using sscanf [Edit: just saw your comment -- it's still fairly easy without string functions -- just scan through to find space/non-space characters to find the words -- annoying but not major) and put them into an array (or, if you have a lot more than you've shown above, a binary search tree).
If you want a count of the number of times each word occurs, you can have an int (or whatever) in each node. If you just want to know the unique word in the input, you don't need a count, just a collection of words.