使用 strtok 不同的字符串长度

发布于 2025-01-13 06:46:56 字数 2549 浏览 0 评论 0原文

void redact_words(const char *text_filename, const char *redact_words_filename){
    FILE *fp = fopen(text_filename,"r");
    FILE *f2p = fopen(redact_words_filename,"r");
    FILE *f3p = fopen("result.txt", "w"); ;
    char buffer1[1000];
    char buffer2[1000];
    char *word;
    char *redact;
    

    
    char **the_words;
    
    //if ((fgets(buffer1, 1000 ,fp) == NULL) || (fgets(buffer2,1000 ,f2p) == NULL))
    
    fgets(buffer1,1000,fp);
    fgets(buffer2,1000,f2p);
    
    rewind(fp);
    rewind(f2p);
    
    int word_count = 0; 
    while (!feof(f2p)){
        char c = fgetc(f2p);
        if (c == ' '){
            word_count += 1;
        }
    }
    word_count += 1;
    
    the_words = malloc(3 * sizeof(char*));
    redact = strtok(buffer2, ", ");
    
    for (int i = 0; i < word_count; i++){
        the_words[i] = malloc(100);
        the_words[i] = redact;
        redact = strtok(NULL, ", ");
    }
    
    char result[256] = "";
    word = strtok(buffer1, " ");
    while (word != NULL){
        for (int i = 0; i < word_count; i++){
            if (strcasecmp(the_words[i],word) == 0){
                for (int i = 0; i < strlen(word); i++){
                    strcat(result,"*");
                    
                }
                strcat(result, " ");
                break; 
            }
            else{
                if (i==(word_count-1)){
                    strcat(result, word);
                    strcat(result, " ");
                }  
            } 
        }
        word = strtok(NULL," "); 
    }
    
    fputs(result, f3p);
    
    fclose(fp);
    fclose(f2p);
    fclose(f3p);
    free(the_words);
}

这是我的 C 代码,如果名为 redact_words_filename 的文件中存在该单词,则用星号替换名为 text_filename 的文件中的单词。然而,我注意到在与 2 个字符串的比较过程中

if (strcasecmp(the_words[i],word) == 0){
                for (int i = 0; i < strlen(word); i++){
                    strcat(result,"*");

                }

,当我在两个文本文件中都有“quick”一词时,the_words[i] 包含一个长度为 6 的字符串,而 word 中的一个包含一个长度为 5 的字符串,两者都包含值很快,因此它没有注册为同一个字符串。为什么其中一根弦比另一根长?

(Ps,我对糟糕的代码质量表示歉意)

编辑1:好的,所以我发现它与每行末尾的 \n 有关。试图找到一种方法来解决这个问题。

编辑2:我设法通过一个简单的for循环摆脱\n

for (int i = 0; i < word_count; i++){
        the_words[i] = malloc(100);
        the_words[i] = redact;
        for (int j = 0; j < strlen(redact); j++){
            if (redact[j] == '\n'){
                redact[j] = '\0';
            }
        }
        redact = strtok(NULL, ", ");
    }



void redact_words(const char *text_filename, const char *redact_words_filename){
    FILE *fp = fopen(text_filename,"r");
    FILE *f2p = fopen(redact_words_filename,"r");
    FILE *f3p = fopen("result.txt", "w"); ;
    char buffer1[1000];
    char buffer2[1000];
    char *word;
    char *redact;
    

    
    char **the_words;
    
    //if ((fgets(buffer1, 1000 ,fp) == NULL) || (fgets(buffer2,1000 ,f2p) == NULL))
    
    fgets(buffer1,1000,fp);
    fgets(buffer2,1000,f2p);
    
    rewind(fp);
    rewind(f2p);
    
    int word_count = 0; 
    while (!feof(f2p)){
        char c = fgetc(f2p);
        if (c == ' '){
            word_count += 1;
        }
    }
    word_count += 1;
    
    the_words = malloc(3 * sizeof(char*));
    redact = strtok(buffer2, ", ");
    
    for (int i = 0; i < word_count; i++){
        the_words[i] = malloc(100);
        the_words[i] = redact;
        redact = strtok(NULL, ", ");
    }
    
    char result[256] = "";
    word = strtok(buffer1, " ");
    while (word != NULL){
        for (int i = 0; i < word_count; i++){
            if (strcasecmp(the_words[i],word) == 0){
                for (int i = 0; i < strlen(word); i++){
                    strcat(result,"*");
                    
                }
                strcat(result, " ");
                break; 
            }
            else{
                if (i==(word_count-1)){
                    strcat(result, word);
                    strcat(result, " ");
                }  
            } 
        }
        word = strtok(NULL," "); 
    }
    
    fputs(result, f3p);
    
    fclose(fp);
    fclose(f2p);
    fclose(f3p);
    free(the_words);
}

So this is my C code to replace words from the file called text_filename with asterixs if the word exists in a file called redact_words_filename. However, I noticed during the comparison with the 2 strings

if (strcasecmp(the_words[i],word) == 0){
                for (int i = 0; i < strlen(word); i++){
                    strcat(result,"*");

                }

that when I have the word quick for example in both text files, the_words[i] contains a string of length 6 while the one in word contains a string of length 5, both containing the value quick, and so it is not registering as the same string. Why is one of the strings longer than another?

(P.s I apologise for the bad code quality)

Edit 1: Ok so I found out it has to do with \n which is put in at the end of every line. Trying to find a way to solve this.

Edit 2: I managed to get rid of \n through a simple for loop

for (int i = 0; i < word_count; i++){
        the_words[i] = malloc(100);
        the_words[i] = redact;
        for (int j = 0; j < strlen(redact); j++){
            if (redact[j] == '\n'){
                redact[j] = '\0';
            }
        }
        redact = strtok(NULL, ", ");
    }



如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

命硬 2025-01-20 06:46:56
    the_words = malloc(3 * sizeof(char*));
    redact = strtok(buffer2, ", ");
    
    for (int i = 0; i < word_count; i++){
        the_words[i] = malloc(100);
        the_words[i] = redact;
        redact = strtok(NULL, ", ");
    }

两个明显的问题就在这里,

  • 您为 the_words 中的 3 个指针分配了空间,但随后您将 word_count 个单词放入其中。所以如果 word_count > 3、您将溢出并为每个单词获得未定义的行为
  • ,您分配 100 个字节,然后丢弃该分配,而是将指针存储到 buffer2 中。缓冲区当前包含该单词,但下次您读入该单词时,该单词会发生变化。您应该只使用 the_words[i] = strdup(redact); 来分配适量的内存,并将字符串复制到分配的内存中。
    the_words = malloc(3 * sizeof(char*));
    redact = strtok(buffer2, ", ");
    
    for (int i = 0; i < word_count; i++){
        the_words[i] = malloc(100);
        the_words[i] = redact;
        redact = strtok(NULL, ", ");
    }

Two obvious problems just here

  • you allocate space for 3 pointers in the_words but then you go and put word_count words into it. So if word_count > 3, you'll overflow and get undefined behavior
  • for each word, you allocate 100 bytes, and then throw away that allocation, instead storing a pointer into buffer2. The buffer currently contains the word but that will change next time you read into it. You should just use the_words[i] = strdup(redact); to both allocate the right amount of memory, and copy the string into the allocated memory.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文