解析 C++ 中由制表符和换行符分隔的字符串;

发布于 2025-01-12 15:59:24 字数 1316 浏览 0 评论 0原文

我正在为一个个人项目编写一个程序,该项目想要从谷歌图书中获取单词列表及其出现次数,并将它们放入一个带有其出现次数的向量中,以便我可以将列表削减一些。单词列表的格式如下:包含单词、\t 字符、数字、换行符 (\n) 和重复项。我对这种类型的编程没有太多经验,我想知道有人如何解析以这种方式格式化的文件。这是我到目前为止所拥有的:

#include <iostream>
#include <string>
#include <fstream>
#include <vector>

#define FILE_NAME

using namespace std;

// structure denoting a word occurence
// contains the string of the word and an integer representing its frequency
struct word_occ {
    String word;
    int occurence;
};

vector<word_occ> words_vector;


int main() {
    /*
    File is a .txt file that has the following format:
    word1  #####
    word2  #####

    where word is the word from the english 1-grams from google books
    and ##### is the number of occurences.
    The word is separated from it's occurences by a tab (\t) and other words by a newline (\n).
    All words are entirely lowercase, and all numbers are integers lower than 20,000,000
    */
    ifstream all_words_list(FILE_NAME);
    
    string line;

    string line_word;
    int line_occurence;

    word_occ this_line;

    while (getline(all_words_list, line)) {

        // ... <-- what goes here?

        this_line.word = line_word;
        this_line.occurence = line_occurence;
        words_vector.push_back(this_line);
    }
}

I'm writing a program for a personal project that wants to take a list of words from google books and their occurrences and put them into a vector with their occurrences attached so I can whittle the list down some. The list of words is formatted such that it has the word, a \t character, the number, a newline (\n), and it repeats. I don't have much experience with this type of programming, I was wondering how someone may parse a file that's formatted this way. Here's what I have so far:

#include <iostream>
#include <string>
#include <fstream>
#include <vector>

#define FILE_NAME

using namespace std;

// structure denoting a word occurence
// contains the string of the word and an integer representing its frequency
struct word_occ {
    String word;
    int occurence;
};

vector<word_occ> words_vector;


int main() {
    /*
    File is a .txt file that has the following format:
    word1  #####
    word2  #####

    where word is the word from the english 1-grams from google books
    and ##### is the number of occurences.
    The word is separated from it's occurences by a tab (\t) and other words by a newline (\n).
    All words are entirely lowercase, and all numbers are integers lower than 20,000,000
    */
    ifstream all_words_list(FILE_NAME);
    
    string line;

    string line_word;
    int line_occurence;

    word_occ this_line;

    while (getline(all_words_list, line)) {

        // ... <-- what goes here?

        this_line.word = line_word;
        this_line.occurence = line_occurence;
        words_vector.push_back(this_line);
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

〗斷ホ乔殘χμё〖 2025-01-19 15:59:24

字符串流可能会起作用:

while (getline(all_words_list, line)) {
    std::istringstream ss(line);
    ss >> line_word;
    ss >> line_occurence; 

    ...

A string stream would likely work:

while (getline(all_words_list, line)) {
    std::istringstream ss(line);
    ss >> line_word;
    ss >> line_occurence; 

    ...
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文