C语言中如何检查字符串中的重复单词?

发布于 2024-10-21 13:17:28 字数 156 浏览 2 评论 0原文

我正在解决 C 中的一个问题,我必须在字符串中找到重复的单词,例如

 char a[]="This is it This";

上面的字符串“This”出现两次,所以我想将其算作一个。

有人能建议如何实现这一目标吗?

I am solving a problem in C where i have to find duplicate words in astring like

 char a[]="This is it This";

In above string "This" appears two times so I would like to count it as one.

Can anybody suggest how to achieve this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

泡沫很甜 2024-10-28 13:17:28

这是一个可以满足您要求的程序。它被硬编码为 4 个字,最多 99 个字符。这很容易改变;我只是根据你的输入来调整它。我还使用了 strcmpstrcpy。这两个函数都可以自己实现(将它们称为 mystrcpy 和 mystrcmp 并嵌入它们)。我不会为你重写字符串函数。我确实根据其他答案展示了如何避免 strtok。我查了一下,它们并不复杂,但它们没有向程序添加任何内容,我不想重新发明轮子。最后,我只是在 notInArray 函数中使用了简单的线性搜索。对于大型数据集,这效率不高(您可能会使用某种类型的树或哈希)。

这是在 gcc 版本 4.3.4 下编译的,

#include <stdio.h>
#include <string.h>

int notInArray(char arr[][100], char *word, int size);

int main() {
  char a[] = "This is a This";
  char *ptr;
  char strarr[4][100];
  char word[100];
  int pos = 0;
  int count = 0;
  int i;

  memset(&strarr,0,sizeof(strarr));
  printf("%s\n\n",a);

  ptr = a;
  while (*ptr) {

    sscanf(ptr, "%s ", word);
    if (notInArray(strarr,word,4)) {
      strcpy(strarr[pos++],word);
      printf("%s\n", word);
    }

    while (!isspace(*ptr++) && *ptr) {}
  }

  for (i=0; i<4; i++) {
    if (*strarr[i]) {
      printf("strarr[%d]=%s\n",i, strarr[i]);
      count++;
    }
  }

  printf("\nUnique wordcount = %d\n", count);

  return(0);
}

int notInArray(char arr[][100], char *word, int size) {
  int i;

  for (i=0; i<size; i++) {
    if (*arr[i] && !strcmp(arr[i],word)) {
      return(0);
    }
  }

  return(1);
}

输出如下所示:

~>a
This is a This

This
is
a
strarr[0]=This
strarr[1]=is
strarr[2]=a

Unique wordcount = 3

Enjoy。

Here is a program that does what you're asking. It is hard coded for 4 words of a max 99 characters. That can be changed easily; I just fit it around your input. I also used strcmp and strcpy. Both of these functions can be implemented on your own (call them mystrcpy and mystrcmp and embed them). I'm not rewriting the string functions for you. I did show how to avoid strtok based on the other answer. I looked them up and they are not complex, but they did not add anything to the program and I didn't want to reinvent the wheel. Last of all, I just used a simple linear search in the notInArray function. For a large data set this is not efficient (you would probably use some type of tree or hash).

This was compiled under gcc version 4.3.4

#include <stdio.h>
#include <string.h>

int notInArray(char arr[][100], char *word, int size);

int main() {
  char a[] = "This is a This";
  char *ptr;
  char strarr[4][100];
  char word[100];
  int pos = 0;
  int count = 0;
  int i;

  memset(&strarr,0,sizeof(strarr));
  printf("%s\n\n",a);

  ptr = a;
  while (*ptr) {

    sscanf(ptr, "%s ", word);
    if (notInArray(strarr,word,4)) {
      strcpy(strarr[pos++],word);
      printf("%s\n", word);
    }

    while (!isspace(*ptr++) && *ptr) {}
  }

  for (i=0; i<4; i++) {
    if (*strarr[i]) {
      printf("strarr[%d]=%s\n",i, strarr[i]);
      count++;
    }
  }

  printf("\nUnique wordcount = %d\n", count);

  return(0);
}

int notInArray(char arr[][100], char *word, int size) {
  int i;

  for (i=0; i<size; i++) {
    if (*arr[i] && !strcmp(arr[i],word)) {
      return(0);
    }
  }

  return(1);
}

The output looks like:

~>a
This is a This

This
is
a
strarr[0]=This
strarr[1]=is
strarr[2]=a

Unique wordcount = 3

Enjoy.

我们的影子 2024-10-28 13:17:28

我可能会一次读一个单词(例如,使用 sscanf [编辑:刚刚看到你的评论 - 没有字符串函数仍然相当容易 - 只需扫描以查找空格/非空格字符即可找到单词 -烦人但不是主要的)并将它们放入一个数组中(或者,如果你有比上面显示的更多的东西,一个二叉搜索树)。

如果你想要计算每个单词出现的次数,你可以在每个节点中有一个 int (或其他)。如果您只想知道输入中的唯一单词,则不需要计数,只需要单词的集合。

I'd probably read words one at a time (e.g., using sscanf [Edit: just saw your comment -- it's still fairly easy without string functions -- just scan through to find space/non-space characters to find the words -- annoying but not major) and put them into an array (or, if you have a lot more than you've shown above, a binary search tree).

If you want a count of the number of times each word occurs, you can have an int (or whatever) in each node. If you just want to know the unique word in the input, you don't need a count, just a collection of words.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文