可以安全地修改成对向量中的 std::pair::first 吗？

发布于 2024-07-09 18:15:29 字数 478 浏览 6 评论 0原文

我目前正在研究 DNA 数据库类，并且目前将数据库中的每一行与匹配分数（基于编辑距离）和实际 DNA 序列本身相关联，在迭代循环中首先以这种方式修改是否安全？

typedef std::pair<int, DnaDatabaseRow> DnaPairT;
typedef std::vector<DnaPairT>          DnaDatabaseT;

// ....

for(DnaDatabaseT::iterator it = database.begin();
    it != database.end(); it++)
{
    int score = it->second.query(query);
    it->first = score;
}

我这样做的原因是为了以后可以按分数对它们进行排序。我尝试过地图并收到有关首先修改的编译错误，但是是否有比这更好的方法来存储所有信息以供稍后排序？

原文

I'm currently working on a DNA database class and I currently associate each row in the database with both a match score (based on edit distance) and the actual DNA sequence itself, is it safe to modify first this way within an iteration loop?

typedef std::pair<int, DnaDatabaseRow> DnaPairT;
typedef std::vector<DnaPairT>          DnaDatabaseT;

// ....

for(DnaDatabaseT::iterator it = database.begin();
    it != database.end(); it++)
{
    int score = it->second.query(query);
    it->first = score;
}

The reason I am doing this is so that I can sort them by score later. I have tried maps and received a compilation error about modifying first, but is there perhaps a better way than this to store all the information for sorting later?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

后知后觉 2024-07-16 18:15:29

回答你的第一个问题，是的。修改对的成员是完全安全的，因为对中的实际数据不会影响向量本身。

编辑：我有一种感觉，您在使用地图时遇到错误，因为您尝试修改地图内部对的first值。这是不允许的，因为该值是地图内部工作的一部分。

正如dribeas所述：

在地图中，您不能首先进行更改，因为这会破坏地图作为排序平衡树的不变量

编辑：要回答你的第二个问题，我认为你构建数据的方式没有任何问题，但我会让数据库保存指向DnaPairT的指针 对象，而不是对象本身。这将大大减少排序过程中复制的内存量。

#include <vector>
#include <utility>
#include <algorithm> 

typedef std::pair<int, DnaDatabaseRow> DnaPairT;
typedef std::vector<DnaPairT *>       DnaDatabaseT;

// ...

// your scoring code, modified to use pointers
void calculateScoresForQuery(DnaDatabaseT& database, queryT& query)
{
    for(DnaDatabaseT::iterator it = database.begin(); it != database.end(); it++)
    {
        int score = (*it)->second.query(query);
        (*it)->first = score;
    }
}

// custom sorting function to handle DnaPairT pointers
bool sortByScore(DnaPairT * A, DnaPairT * B) { return (A->first < B->first); }

// function to sort the database
void sortDatabaseByScore(DnaDatabaseT& database)
{
    sort(database.begin(), database.end(), sortByScore);
}

// main
int main()
{
    DnaDatabaseT database;

    // code to load the database with DnaPairT pointers ...

    calculateScoresForQuery(database, query);
    sortDatabaseByScore(database);

    // code that uses the sorted database ...
}

您可能需要研究更有效的方法的唯一原因是您的数据库太大以至于排序循环需要很长时间才能完成。但如果是这种情况，我想您的 query 函数将占用大部分处理时间。

To answer your first question, yes. It is perfectly safe to modify the members of your pair, since the actual data in the pair does not affect the vector itself.

edit: I have a feeling that you were getting an error when using a map because you tried to modify the first value of the map's internal pair. That would not be allowed because that value is part of the map's inner workings.

As stated by dribeas:

In maps you cannot change first as it would break the invariant of the map being a sorted balanced tree

edit: To answer your second question, I see nothing at all wrong with the way you are structuring the data, but I would have the database hold pointers to DnaPairT objects, instead of the objects themselves. This would dramatically reduce the amount of memory that gets copied around during the sort procedure.

#include <vector>
#include <utility>
#include <algorithm> 

typedef std::pair<int, DnaDatabaseRow> DnaPairT;
typedef std::vector<DnaPairT *>       DnaDatabaseT;

// ...

// your scoring code, modified to use pointers
void calculateScoresForQuery(DnaDatabaseT& database, queryT& query)
{
    for(DnaDatabaseT::iterator it = database.begin(); it != database.end(); it++)
    {
        int score = (*it)->second.query(query);
        (*it)->first = score;
    }
}

// custom sorting function to handle DnaPairT pointers
bool sortByScore(DnaPairT * A, DnaPairT * B) { return (A->first < B->first); }

// function to sort the database
void sortDatabaseByScore(DnaDatabaseT& database)
{
    sort(database.begin(), database.end(), sortByScore);
}

// main
int main()
{
    DnaDatabaseT database;

    // code to load the database with DnaPairT pointers ...

    calculateScoresForQuery(database, query);
    sortDatabaseByScore(database);

    // code that uses the sorted database ...
}

The only reason you might need to look into more efficient methods is if your database is so enormous that the sorting loop takes too long to complete. If that is the case, though, I would imagine that your query function would be the one taking up most of the processing time.

回复收藏 0 原文