我有一组弦。其中,2个以上的组可以表示相同的事物。这些组的存储方式应该使得给定该组的任何成员,您都可以高效地获取该组的其他成员。
因此,给定这个初始集: ["a","b1","b2","c1","c2","c3"]
结果结构应该类似于 [" a",["b1","b2"],["c1","c2","c3"]]
和 Fetch("b") 应该返回 ["b1","b2 “]
。
是否有用于此目的的特定数据结构和/或算法?
编辑:“b1”和“b2”不是实际的字符串,它们表明两者属于同一组。否则,Trie 将是一个完美的选择。
I have a set of strings. Out of these, groups of 2 or more may represent the same thing. These groups should be stored in a way that given any member of the group, you can fetch other members of the group with high efficiency.
So given this initial set: ["a","b1","b2","c1","c2","c3"]
the result structure should be something like ["a",["b1","b2"],["c1","c2","c3"]]
and Fetch("b") should return ["b1","b2"]
.
Is there a specific data structure and/or algorithm for this purpose?
EDIT: "b1" and "b2" are not actual strings, they're indicating that the 2 belong to the same group. Otherwise a Trie would be a perfect fit.
发布评论
评论(2)
我可能会误解最初的问题设置,但我相信使用现成的数据结构有一个简单而优雅的解决方案。从较高的层次来看,这个想法是创建一个从字符串到字符串集的映射。映射中的每个键都将与它所对应的字符串集相关联。假设组中的每个字符串都映射到同一组字符串,则可以节省时间和空间。
该算法可能如下所示:
该算法和生成的数据结构非常有效。假设您已经提前知道簇,则此过程(使用 trie 作为映射的实现,使用简单列表作为集合的数据结构)要求您访问每个输入字符串的每个字符两次 - 一次在插入时将其放入特里树中,并在将其添加到与其相等的字符串集中时一次(假设您正在制作深层复制)。因此,这是一个 O(n) 算法。
此外,查找速度非常快 - 要找到等于某个字符串的字符串集,只需遍历 trie 来查找该字符串,查找关联的字符串集,然后迭代它。这需要 O(L + k) 时间,其中 L 是字符串的长度,k 是匹配的数量。
希望这有帮助,如果我误解了问题陈述,请告诉我!
I may be misinterpreting the initial problem setup, but I believe that there is a simple and elegant solution to this problem using off-the-shelf data structures. The idea is, at a high level, to create a map from strings to sets of strings. Each key in the map will be associated with the set of strings that it's equal to. Assuming that each string in a group is mapped to the same set of strings, this can be done time- and space-efficiently.
The algorithm would probably look like this:
This algorithm and the resulting data structure is quite efficient. Assuming that you already know the clusters in advance, this process (using a trie as the implementation of the map and a simple list as the data structure for the sets) requires you to visit each character of each input string exactly twice - once when inserting it into the trie and once when adding it to the set of strings equal to it, assuming that you're making a deep copy. This is therefore an O(n) algorithm.
Moreover, lookup is quite fast - to find the set of strings equal to some string, just walk the trie to find the string, look up the associated set of strings, then iterate over it. This takes O(L + k) time, where L is the length of the string and k is the number of matches.
Hope this helps, and let me know if I've misinterpreted the problem statement!
由于这是 Java,我将使用
HashMap>
。这会将每个字符串映射到其等价集(其中包含该字符串以及属于同一组的所有其他字符串)。如何根据输入构建等价集取决于您如何定义“等价”。如果输入按组排序(但实际上并未分组),并且如果您实现了谓词来测试等效性,则可以执行如下操作:请注意,只有当组是输入中的连续元素时,这才有效。如果不是,您将需要更复杂的簿记来构建集团结构。
Since this is Java, I would use a
HashMap<String, Set<String>>
. This would map each string to its equivalence set (which would contain that string and all others that belong to the same group). How you would construct the equivalence sets from the input depends on how you define "equivalent". If the inputs are in order by group (but not actually grouped), and if you had a predicate implemented to test equivalence, you could do something like this:Note that this only works if the groups are contiguous elements in the input. If they aren't you'll need more complex bookkeeping to build the group structure.