将单词值放入具有设定值的键映射中

发布于 2024-11-04 22:58:59 字数 1259 浏览 1 评论 0原文

我知道什么是 map 以及它的一般基本功能,但我不知道为什么这里使用 set 而不是仅仅声明 int i = value 或类似的东西。

我真正想做的是:将单词放入向量后,我也想使用相同的单词作为值的键。但我真的不知道使用map来做到这一点的全部目的。不确定我是否提供了足够的信息,但只需询问您需要更多信息,我会回复。

我只提供了一个 readWords 函数,但如果有人需要完整的代码,包括头文件、类和主文件,那么我也可以把它们放出来。

我在帮助下写下了部分代码,但老实说我不知道​​在 push_back() 函数之后它在做什么。

/* Read word-by-word from filename and store words in text vector.
* Also use normalized version of word as key in concordance map
* The value associated with each key in the map is a set whose
* keys are the associated indices into the vector.
*/
void Concordance::readWords(char * filename){
    ifstream fin(filename, ifstream::in);
    if (fin.is_open()){
        while(!fin.eof()){
            string word;
            fin >> word;
            normalize(word);
            text.push_back(word); //puts word into vector

            set<int> seat;
            seat.insert(text.size()-1);
            pair<string, set<int> > pear;
            concordance.insert(pear);

        }
    }
    else{
        cerr << "Unable to open file datafile.txt";
        exit(1);   // call system to stop
    }
    fin.close(); //closes the filename
}

I know what a map is and the general basic functions of it, but I don't know why a set is being used here instead of just having declared int i = value or something similar.

What I'm really trying to do is: after putting the word into the vector, I want to use the same word as a key to a value also. But I really don't know the whole purpose of using the map to do that. Not sure if I'm giving enough information but just ask what you need more and I'll reply.

I've only supplied just a the readWords function, but if anyone needs the full code, including the header, class, and main file, then I can put them up also.

I partially have the code written down there, with help, but I honestly don't know what it's doing after the push_back() function.

/* Read word-by-word from filename and store words in text vector.
* Also use normalized version of word as key in concordance map
* The value associated with each key in the map is a set whose
* keys are the associated indices into the vector.
*/
void Concordance::readWords(char * filename){
    ifstream fin(filename, ifstream::in);
    if (fin.is_open()){
        while(!fin.eof()){
            string word;
            fin >> word;
            normalize(word);
            text.push_back(word); //puts word into vector

            set<int> seat;
            seat.insert(text.size()-1);
            pair<string, set<int> > pear;
            concordance.insert(pear);

        }
    }
    else{
        cerr << "Unable to open file datafile.txt";
        exit(1);   // call system to stop
    }
    fin.close(); //closes the filename
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

妳是的陽光 2024-11-11 22:58:59

我认为您不完全理解该算法的要求。 (顺便问一下,这是家庭作业吗?)

这里的目标是生成一个索引——每个单词出现的所有情况的列表。 set 的要点是保存所有 出现的情况。 (例如:单词“apple”可能出现在第 1、73 和 100 页上。因此“apple”的映射条目必须包含所有这些值。)

标准化的要点是节省读者的索引时间:“ apple”、“Apple”和“apples”可能都应该位于地图的一个条目中。

了解了这一点,我们就可以更新您的程序。

首先,在读取数据之前,切勿检查eof。只有在读取数据后检查它才有意义。事实上,这种检查有一个更简单的习惯用法:

string word;
while (fin >> word) {
    ...

在我看来,我们需要将原始单词存储在向量中,然后使用标准化单词作为地图索引

text.push_back(word);
normalize(word);

现在,更新地图很容易。您不需要pair,只需使用[] 运算符即可。意识到仅仅引用一个映射条目就会导致它的出现!

concordance[word].insert(text.size()-1);

编辑将最后一点分开:

concordance[word]在地图中查找由word索引的条目。如果该条目存在,则返回该条目。如果该条目不存在,则创建该条目,并返回新形成的条目。 .insert 是对由 word 索引的映射条目定位的集合的插入操作。 text.size()-1 是插入到由 word 索引的映射条目处的集合中的值。

将其重新组合在一起,concordance[word].insert(text.size()-1) 查看地图,检索(或创建)指示的,然后将数字 text.size()-1 插入该集合中。

就这样吧!

I don't think you completely understand the requirement for this algorithm. (Is this homework, by the way?)

The goal here is to produce a concordance -- a list of all of the occurrences of each word. The point of the set is to hold all of the occurrences. (Ex: The word "apple" might appear on pages 1, 73, and 100. So the map entry for "apple" must hold all of those values.)

The point of normalization is to save the reader of the concordance time: "apple", "Apple", and "apples" should all probably be in one entry in the map.

Understanding that, we can update your program.

First, never check for eof before you read the data. It only makes sense to check for it after you read the data. In fact, there is a much simpler idiom for this check:

string word;
while (fin >> word) {
    ...

It appears to me that we are required to store the original word in the vector, and then use the normalized word as the map index

text.push_back(word);
normalize(word);

Now, updating the map is easy-peasy. You don't need a pair, just use the [] operator. Realize that merely referencing a map entry causes it to spring into existence!

concordance[word].insert(text.size()-1);

EDIT Breaking that last bit apart:

concordance[word] looks up an entry, indexed by word in the map. If the entry exists, it is returned. If the entry does not exit, it is created, and the newly-formed entry is returned. .insert is the insert operation on the set located by the map entry indexed by word. text.size()-1 is the value inserted into the set located at map entry indexed by word.

Putting it back together, concordance[word].insert(text.size()-1) looks into the map, retrieves (or creates) the indicated set, and then inserts the number text.size()-1 into that set.

There you go!

只为一人 2024-11-11 22:58:59

我不确定您是否在复制代码时犯了错误,或者代码是否是故意这样的,但是没有使用 seat 集(除了插入元素之外,但因为它是 它将丢失),并且添加到 concordance 的所有元素都将成对 ("",[empty set])

不读取/存储, 建立一个索引,即从单词到单词的映射向量中单词出现的位置。如果是这种情况,那么这样做可能会更好:

std::map<std::string, std::set<int> > concordance;
//...
concordance[word].insert(text.size()-1); // if it does not exists, it will create it
                                         // if it exists it will retrieve it and
                                         // add the new position

这种模式通常用于将单词索引到页面中(例如一本书),其中集合比向量具有优势,它将保证唯一性,如果一个单词在单个页面中出现 100 次,该集合将确保页码不重复(您必须在向量中进行测试)。代码的情况并非如此,因为索引是单词向量中的位置,它们本身是唯一的。

另请注意,正如 Nawaz 指出的那样,该循环需要进行一些修正。

I am not sure whether you have made a mistake in copying the code, or whether the code was intentionally like that, but the seat set is not used (for other than inserting an element, but since it is not read/stored it will be lost), and all elements added to concordance will be pairs ("",[empty set])

Not it looks like it is trying to build an index, i.e. a mapping from words into the positions in the vector where the word appears. If that is the case, it would probably be better if it was done as:

std::map<std::string, std::set<int> > concordance;
//...
concordance[word].insert(text.size()-1); // if it does not exists, it will create it
                                         // if it exists it will retrieve it and
                                         // add the new position

This pattern is common to index words into pages (for example for a book), where the set has the advantage over say a vector, that it will guarantee uniqueness, if a word appears 100 times in a single page, the set will make sure that the page number is not repeated (you would have to test that in a vector). This is not the case of the code, as the indices are to positions in a vector of words, which are unique in themselves.

Also note, as Nawaz points out, that the loop needs some corrections.

剑心龙吟 2024-11-11 22:58:59

首先,您的 while 循环是错误的,因为在尝试读取之后设置了 eof 标志(或任何其他失败标志)。流失败;这意味着,如果尝试读取失败,您会尝试将先前读取的单词插入向量两次,并且循环中的其余代码仍然执行,而实际上不应该执行。

更惯用的 while 循环是这样的:

string word;
while( fin >> word ){
   normalize(word);
   text.push_back(word); //puts word into vector

   set<int> seat;
   seat.insert(text.size()-1);
   pair<string, set<int> > pear;
   concordance.insert(pear);
}

如果尝试读取(即 fin >> word)失败,则返回的 std::istream& 隐式转换为 < code>false,然后循环退出。

我不清楚你的帖子的其余部分、问题以及你到底想做什么,所以我无法对此发表评论。

First of all, your while loop is wrong, because the eof flag (or any other failure flag) is set after an attempt to read from the stream fails; that means, if the attempt to read fails, you try inserting the previously read word into the vector twice, and the rest of the code in the loop still executes when it in fact should not.

A more idiomatic while loop would be this:

string word;
while( fin >> word ){
   normalize(word);
   text.push_back(word); //puts word into vector

   set<int> seat;
   seat.insert(text.size()-1);
   pair<string, set<int> > pear;
   concordance.insert(pear);
}

If the attempt to read (i.e fin >> word) fails, then returned std::istream& implicitly converts into false, and the loop exit.

And I didn't clearly understand the rest of your post, the question and what exactly you're trying to do, so I can't comment on that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文