如何有效地存储等价项（来自连通分量标记算法）？

发布于 2024-12-26 19:52:16 字数 753 浏览 0 评论 0原文

我想存储连接组件标记算法中的等价项。它基本上是从一个值（一个标签的 ID）到多个值（来自与前者等效的标签的 ID）的一种映射。

我已经做了类似的事情，但它并不能很好地工作：

std::map<unsigned short, std::list<unsigned int>> equivalences;
for(int i = 0; i < MAX_NUMBER_OF_LABELS; ++i )
{
    std::list<unsigned int> temp;
    temp.push_back(i);
    // note that a label is equivalent to itself
    equivalences.insert( std::pair< int, std::list<unsigned int>>(i, temp) );
}

然后我添加适当的等价by：

equivalences.at( i ).push_back( equivalent_labels_int );

这种方法的主要缺点是我必须预先声明map的大小（它必须足够大），然后对于大尺寸（例如9999），初始化时间约为2.5 s。

有人有更好的主意吗？

原文

I would like to store equivalences from Connected-component labeling algorithm. It's basically making a kind of map from one value (one label's ID) to multiple values (IDs from labels that are equivalent to the former.)

I have already done something like this but it does not work really well:

std::map<unsigned short, std::list<unsigned int>> equivalences;
for(int i = 0; i < MAX_NUMBER_OF_LABELS; ++i )
{
    std::list<unsigned int> temp;
    temp.push_back(i);
    // note that a label is equivalent to itself
    equivalences.insert( std::pair< int, std::list<unsigned int>>(i, temp) );
}

Then I add proper equivalence by:

equivalences.at( i ).push_back( equivalent_labels_int );

The main drawback of this method is that I have to declare map's size up front (it has to be big enough) and then for large sizes (e.g. 9999) the initialization time is approximately 2.5s.

Anyone have a better idea?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

始终不够 2025-01-02 19:52:16

您不需要在 C++（或大多数语言）中预先调整 map 的大小。地图可以通过添加新元素来动态增长，因此如果您找到新的键，您可以随时将其添加到地图中。例如：

equivalences[i].push_back(equivalent_labels_int);

这是有效的，因为 map 的方括号运算符 (operator[]) 会自动向 map 添加一个新的键/值对> 使用给定的键和默认值（如果尚不存在）。

此外，我建议不要使用 list 作为存储连接的 blob 序列的容器。当您不需要随机访问并且经常删除序列中间的元素时，list 非常有用，我认为您实际上并没有这样做。相反，我建议使用 vector 或 deque，因为这些结构的空间效率更高，并且具有更好的局部性。

最后，根据您的特定需求，您可能希望完全切换数据结构。如果您的算法通过从某个起点运行深度优先搜索然后存储遇到的所有结果来工作，那么您现在拥有的方法可能非常好。但是，如果您的算法通过查找相似的点对然后将它们包含的斑点合并在一起来工作，您可能会对不相交集森林数据结构，实现简单，但性能极佳。也就是说，使用这种结构会让您无法检查哪些点连接到给定点，但效率的提升是相当显着的。

希望这有帮助！

You do not need to size the map up-front in C++ (or most languages, for that matter). maps can dynamically grow by having new elements added into them, so if you find a new key, you can always add it to the map. For example:

equivalences[i].push_back(equivalent_labels_int);

This works because the map's square brackets operator (operator[]) will automatically add a new key/value pair to the map with the given key and a default value if one doesn't already exist.

Additionally, I would advise not using list as the container for storing the sequence of connected blobs. list is good when you don't need random access and are frequently removing elements in the middle of the sequence, which I don't think you're actually doing here. Instead, I would suggest using vector or deque, since those structures are more space efficient and have better locality.

Finally, depending on your particular needs, you may want to switch data structures entirely. If your algorithm works by running a depth-first search out from some starting point and then storing all of the results it encounters, the approach you have now may be quite good. However, if instead your algorithm works by finding pairs of points that are similar and then merging together the blobs they contain, you may be interested in the disjoint-set forest data structure, which has a simple implementation but extremely good performance. That said, using this structure loses you the ability to check what points are connected to a given point, but the boost in efficiency is pretty remarkable.

Hope this helps!

回复收藏 0 原文