strtok() 仅打印第一个单词,其余部分为(空)
我正在尝试解析一个大文本文件并使用 strtok 将其拆分为单个单词。分隔符删除所有特殊字符、空格和换行符。由于某种原因,当我 printf() 它时,它只打印第一个单词和其余的一堆(null)。
ifstream textstream(textFile);
string textLine;
while (getline(textstream, textLine))
{
struct_ptr->numOfCharsProcessedFromFile[TESTFILEINDEX] += textLine.length() + 1;
char *line_c = new char[textLine.length() + 1]; // creates a character array the length of the line
strcpy(line_c, textLine.c_str()); // copies the line string into the character array
char *word = strtok(line_c, delimiters); // removes all unwanted characters
while (word != nullptr && wordCount(struct_ptr->dictRootNode, word) > struct_ptr->minNumOfWordsWithAPrefixForPrinting)
{
MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n'; // writes each word and number of times it appears as a prefix in the tree
word = strtok(NULL, delimiters); // move to next word
printf("%s", word);
}
}
I am trying to parse a large text file and split it up into single words using strtok. The delimiters remove all special characters, whitespace, and new lines. For some reason when I printf() it, it only prints the first word and a bunch of (null) for the rest.
ifstream textstream(textFile);
string textLine;
while (getline(textstream, textLine))
{
struct_ptr->numOfCharsProcessedFromFile[TESTFILEINDEX] += textLine.length() + 1;
char *line_c = new char[textLine.length() + 1]; // creates a character array the length of the line
strcpy(line_c, textLine.c_str()); // copies the line string into the character array
char *word = strtok(line_c, delimiters); // removes all unwanted characters
while (word != nullptr && wordCount(struct_ptr->dictRootNode, word) > struct_ptr->minNumOfWordsWithAPrefixForPrinting)
{
MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n'; // writes each word and number of times it appears as a prefix in the tree
word = strtok(NULL, delimiters); // move to next word
printf("%s", word);
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我不会跳过使用
strtok
所需的麻烦,而是编写一个直接使用字符串的小替换,而不修改其输入,按照这个一般顺序:至少对我来说,这似乎是大大简化了代码的其余部分:
这也避免了(除其他外)您所拥有的大量内存泄漏,在循环的每次迭代中分配内存,但从不释放任何内存。
Rather than jumping through the hoops necessary to use
strtok
, I'd write a little replacement that works directly with strings, without modifying its input, something on this general order:At least to me, this seems to simplify the rest of the code quite a bit:
This also avoids (among other things) the massive memory leak you had, allocating memory every iteration of your loop, but never freeing any of it.
将
printf
向上移动两行。Move
printf
two lines UP.正如 @j23 指出的,您的 printf 位于错误的位置。
正如 @Jerry-Coffin 指出的,有更多的 C++ 风格和现代方法可以完成您尝试做的事情。除了避免突变之外,您还可以避免从文本字符串中复制单词。 (在下面的代码中,我们逐行读取,但如果您知道整个文本适合内存,您也可以将整个内容读取到
std::string
中。)因此,使用 < code>std::string_view 避免执行额外的副本,它就像指向字符串的指针和长度。
在这里,对于一个用例,您不需要将单词存储在另一个数据结构中 - 某种单词的一次性处理:
As @j23 pointed out, your
printf
is in the wrong location.As @Jerry-Coffin points out, there are more c++-ish and modern ways to accomplish, what you try to do. Next to avoiding mutation, you can also avoid copying the words out of the text string. (In my code below, we read line by line, but if you know your whole text fits into memory, you could as well read the whole content into a
std::string
.)So, using
std::string_view
avoids to perform extra copies, it being just something like a pointer into your string and a length.Here, how it looks like, for a use case, where you need not store the words in another data structure - some kind of one-pass processing of words: