需要 C++ 方面的帮助使用映射来跟踪 INPUT 文件中的单词

发布于 2024-09-06 05:31:58 字数 1168 浏览 2 评论 0原文

假设我有一个文本文件,

today is today but
tomorrow is today tomorrow

然后使用地图,我如何跟踪重复的单词?它在哪一行重复? 到目前为止,我已将文件中的每个字符串作为临时值读取,并以以下方式存储:

    map<string,int> storage;

    int count = 1 // for the first line of the file

    if(infile.is_open()){
     while( !infile.eof() ){ 
      getline(in, line);
      istringstream my_string(line);
      while(my_string.good()){
         string temp;
         my_string >> temp;

    storage[temp] = count
    }
    count++;// so that every string read in the next line will be recorded as that line.
}
}
   map<string,int>::iterator m;
   for(int m = storage.begin(); m!= storage.end(); m++){
      out<<m->first<<": "<<"line "<<m->second<<endl;
}

现在的输出只是

but: line 1
is: line 2
today: line 2
tomorrow: line 2

But 相反.. 它应该打印出(没有重复的字符串):

today : line 1 occurred 2 times, line 2 occurred 1 time.
is: line 1 occurred 1 time, line 2 occurred 1 time.
but: line 1 occurred 1 time.
tomorrow: line 2 occurred 2 times.

注意:字符串的顺序并不重要。

任何帮助将不胜感激。谢谢。

Let say i have a text file with

today is today but
tomorrow is today tomorrow

then using maps how can i keep track of the words that are repeated? and on which line it repeats?
so far i have each string in the file read in as a temp and it is stored in the following way:

    map<string,int> storage;

    int count = 1 // for the first line of the file

    if(infile.is_open()){
     while( !infile.eof() ){ 
      getline(in, line);
      istringstream my_string(line);
      while(my_string.good()){
         string temp;
         my_string >> temp;

    storage[temp] = count
    }
    count++;// so that every string read in the next line will be recorded as that line.
}
}
   map<string,int>::iterator m;
   for(int m = storage.begin(); m!= storage.end(); m++){
      out<<m->first<<": "<<"line "<<m->second<<endl;
}

right now the output is just

but: line 1
is: line 2
today: line 2
tomorrow: line 2

But instead..
it should print out(no repeating strings):

today : line 1 occurred 2 times, line 2 occurred 1 time.
is: line 1 occurred 1 time, line 2 occurred 1 time.
but: line 1 occurred 1 time.
tomorrow: line 2 occurred 2 times.

Note: the order of the string does not matter.

Any help would be appreciated. Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

岛歌少女 2024-09-13 05:31:58

map 存储具有唯一键的(键,值)对。这意味着如果您多次分配给同一个键,则只会存储您分配的最后一个值。

听起来您想要做的不是将行存储为值,而是希望存储另一个行->出现次数的地图。

所以你可以像这样制作你的地图:

typedef int LineNumber;
typedef int WordHits;
typedef map< LineNumber, WordHits> LineHitsMap;
typedef map< string, LineHitsMap > WordHitsMap;
WordHitsMap storage;

然后插入:

WordHitsMap::iterator wordIt = storage.find(temp);
if(wordIt != storage.end())
{
    LineHitsMap::iterator lineIt = (*wordIt).second.find(count);
    if(lineIt != (*wordIt).second.end())
    {
        (*lineIt).second++;
    }
    else
    {
        (*wordIt).second[count] = 1;
    }
}
else
{
    LineHitsMap lineHitsMap;
    lineHitsMap[count] = 1;
    storage[temp] = lineHitsMap;
}

map stores a (key, value) pair with a unique key. Meaning that if you assign to the same key more than once, only the last value that you assigned will be stored.

Sounds like what you want to do is instead of storing the line as the value, you want to store another map of lines->occurances.

So you could make your map like this:

typedef int LineNumber;
typedef int WordHits;
typedef map< LineNumber, WordHits> LineHitsMap;
typedef map< string, LineHitsMap > WordHitsMap;
WordHitsMap storage;

Then to insert:

WordHitsMap::iterator wordIt = storage.find(temp);
if(wordIt != storage.end())
{
    LineHitsMap::iterator lineIt = (*wordIt).second.find(count);
    if(lineIt != (*wordIt).second.end())
    {
        (*lineIt).second++;
    }
    else
    {
        (*wordIt).second[count] = 1;
    }
}
else
{
    LineHitsMap lineHitsMap;
    lineHitsMap[count] = 1;
    storage[temp] = lineHitsMap;
}
世俗缘 2024-09-13 05:31:58

当您只在其中存储 1 项信息时,您试图从集合中获取 2 项信息。

扩展当前实现的最简单方法是存储结构而不是 int。

因此,

storage[temp] = count

您可以这样做:

storage[temp].linenumber = count;
storage[temp].wordcount++;

在映射定义的地方:

struct worddata { int linenumber; int wordcount; };
std::map<string, worddata> storage;

使用以下方式打印结果:

out << m->first << ": " << "line " << m->second.linenumber << " count: " << m->second.wordcount << endl;

编辑: 使用 typedef 进行定义,例如:

typedef MYMAP std::map<std::string, struct worddata>;
MYMAP storage;

then MYMAP::iterator iter;

you're trying to get 2 items of information out of the collection, when you only store 1 item of information in there.

The easiest way to extend your current implementation is to store a struct instead of an int.

So instead of:

storage[temp] = count

you'd do:

storage[temp].linenumber = count;
storage[temp].wordcount++;

where the map is defined:

struct worddata { int linenumber; int wordcount; };
std::map<string, worddata> storage;

print the results using:

out << m->first << ": " << "line " << m->second.linenumber << " count: " << m->second.wordcount << endl;

edit: use a typedef for the definitions, eg:

typedef MYMAP std::map<std::string, struct worddata>;
MYMAP storage;

then MYMAP::iterator iter;

苯莒 2024-09-13 05:31:58

您的存储数据类型不足以存储您要报告的所有信息。您可以通过使用向量进行计数存储来实现这一目标,但是您必须进行大量的簿记工作,以确保在未遇到单词时实际插入 0,并在遇到新单词时创建具有正确大小的向量遇到。这不是一项微不足道的任务。

您可以将计数部分切换为数字映射,第一个是行,第二个是计数......这会降低代码的复杂性,但并不是最有效的方法。

无论如何,您不能仅使用 std::map

编辑需要做的事情:只是想到了一个更容易生成但更难报告的替代版本: std::vector< std::map>。对于文件中的每个新行,您将生成一个新的 map;并将其推到向量上。您可以创建一个辅助类型 set包含文件中出现的所有单词以供您的报告使用。

无论如何,这可能就是我要做的事情,除非我将所有这些废话封装在一个类中,这样我就可以做类似的事情:

my_counter.word_appearance(word,line_no);

Your storage data type is insufficient to store all the information you want to report. You could get there by using a vector for count storage but you'd have to do a lot of book-keeping to make sure you actually insert a 0 when a word is not encountered and create the vector with the right size when a new word is encountered. Not a trivial task.

You could switch your count part to a map of numbers, first being line and second being count... That would reduce the complexity of your code but wouldn't exactly be the most efficient method.

At any rate, you can't do what you need to do with just a std::map

Edit: just thought of an alternative version that would be easier to generate but harder to report with: std::vector< std::map<std::string, unsigned int> >. For each new line in a file you'd generate a new map<string,int> and push it onto the vector. You could create a helper type set<string> to contain all the words that appear in a file to use in your reporting.

That's probably how I'd do it anyway except I'd encapsulate all that crap in a class so that I'd just do something like:

my_counter.word_appearance(word,line_no);
心在旅行 2024-09-13 05:31:58

除此之外,你的循环都是错误的。您不应该永远在 eof 或良好标志上循环,而应该在读操作成功时循环。你想要这样的东西:

while( getline(in, line) ){ 
      istringstream my_string(line);
      string temp;
      while(my_string >> temp ){
           // do something with temp
      }
}

Apart from anything else, your loops are all wrong. You should never loop on the eof or good flags, but on the success of the read operation. You want something like:

while( getline(in, line) ){ 
      istringstream my_string(line);
      string temp;
      while(my_string >> temp ){
           // do something with temp
      }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文