解析 C++ 中由制表符和换行符分隔的字符串；

发布于 2025-01-12 15:59:24 字数 1316 浏览 0 评论 0原文

我正在为一个个人项目编写一个程序，该项目想要从谷歌图书中获取单词列表及其出现次数，并将它们放入一个带有其出现次数的向量中，以便我可以将列表削减一些。单词列表的格式如下：包含单词、\t 字符、数字、换行符 (\n) 和重复项。我对这种类型的编程没有太多经验，我想知道有人如何解析以这种方式格式化的文件。这是我到目前为止所拥有的：

#include <iostream>
#include <string>
#include <fstream>
#include <vector>

#define FILE_NAME

using namespace std;

// structure denoting a word occurence
// contains the string of the word and an integer representing its frequency
struct word_occ {
    String word;
    int occurence;
};

vector<word_occ> words_vector;


int main() {
    /*
    File is a .txt file that has the following format:
    word1  #####
    word2  #####

    where word is the word from the english 1-grams from google books
    and ##### is the number of occurences.
    The word is separated from it's occurences by a tab (\t) and other words by a newline (\n).
    All words are entirely lowercase, and all numbers are integers lower than 20,000,000
    */
    ifstream all_words_list(FILE_NAME);
    
    string line;

    string line_word;
    int line_occurence;

    word_occ this_line;

    while (getline(all_words_list, line)) {

        // ... <-- what goes here?

        this_line.word = line_word;
        this_line.occurence = line_occurence;
        words_vector.push_back(this_line);
    }
}

原文

I'm writing a program for a personal project that wants to take a list of words from google books and their occurrences and put them into a vector with their occurrences attached so I can whittle the list down some. The list of words is formatted such that it has the word, a \t character, the number, a newline (\n), and it repeats. I don't have much experience with this type of programming, I was wondering how someone may parse a file that's formatted this way. Here's what I have so far:

#include <iostream>
#include <string>
#include <fstream>
#include <vector>

#define FILE_NAME

using namespace std;

// structure denoting a word occurence
// contains the string of the word and an integer representing its frequency
struct word_occ {
    String word;
    int occurence;
};

vector<word_occ> words_vector;


int main() {
    /*
    File is a .txt file that has the following format:
    word1  #####
    word2  #####

    where word is the word from the english 1-grams from google books
    and ##### is the number of occurences.
    The word is separated from it's occurences by a tab (\t) and other words by a newline (\n).
    All words are entirely lowercase, and all numbers are integers lower than 20,000,000
    */
    ifstream all_words_list(FILE_NAME);
    
    string line;

    string line_word;
    int line_occurence;

    word_occ this_line;

    while (getline(all_words_list, line)) {

        // ... <-- what goes here?

        this_line.word = line_word;
        this_line.occurence = line_occurence;
        words_vector.push_back(this_line);
    }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

〗斷ホ乔殘χμё〖 2025-01-19 15:59:24

字符串流可能会起作用：

while (getline(all_words_list, line)) {
    std::istringstream ss(line);
    ss >> line_word;
    ss >> line_occurence; 

    ...

A string stream would likely work:

while (getline(all_words_list, line)) {
    std::istringstream ss(line);
    ss >> line_word;
    ss >> line_occurence; 

    ...

回复收藏 0 原文

~没有更多了~

关于作者

葬﹪忆之殇

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

解析 C++ 中由制表符和换行符分隔的字符串；

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

╰ゝ天使的微笑

少女净妖师

朱洁

觉浅

滥情空心

hl1314520

友情链接

解析 C++ 中由制表符和换行符分隔的字符串；

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

╰ゝ天使的微笑

少女净妖师

朱洁

觉浅

滥情空心

hl1314520

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。