用于保存可互换字符串集的数据结构

发布于 2024-12-24 04:07:44 字数 329 浏览 1 评论 0 原文

我有一组弦。其中,2个以上的组可以表示相同的事物。这些组的存储方式应该使得给定该组的任何成员,您都可以高效地获取该组的其他成员。

因此,给定这个初始集: ["a","b1","b2","c1","c2","c3"] 结果结构应该类似于 [" a",["b1","b2"],["c1","c2","c3"]] 和 Fetch("b") 应该返回 ["b1","b2 “]

是否有用于此目的的特定数据结构和/或算法?

编辑:“b1”和“b2”不是实际的字符串,它们表明两者属于同一组。否则,Trie 将是一个完美的选择。

I have a set of strings. Out of these, groups of 2 or more may represent the same thing. These groups should be stored in a way that given any member of the group, you can fetch other members of the group with high efficiency.

So given this initial set: ["a","b1","b2","c1","c2","c3"] the result structure should be something like ["a",["b1","b2"],["c1","c2","c3"]] and Fetch("b") should return ["b1","b2"].

Is there a specific data structure and/or algorithm for this purpose?

EDIT: "b1" and "b2" are not actual strings, they're indicating that the 2 belong to the same group. Otherwise a Trie would be a perfect fit.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

无悔心 2024-12-31 04:07:44

我可能会误解最初的问题设置,但我相信使用现成的数据结构有一个简单而优雅的解决方案。从较高的层次来看,这个想法是创建一个从字符串到字符串集的映射。映射中的每个键都将与它所对应的字符串集相关联。假设组中的每个字符串都映射到同一组字符串,则可以节省时间和空间。

该算法可能如下所示:

  1. 构造一个从字符串到字符串集的映射 M。
  2. 将所有彼此相等的字符串分组在一起(此步骤取决于如何指定字符串和组)。
  3. 对于每个集群:
    1. 在该簇中创建一组规范的字符串。
    2. 将每个字符串作为键添加到映射中,其值为规范集。

该算法和生成的数据结构非常有效。假设您已经提前知道簇,则此过程(使用 trie 作为映射的实现,使用简单列表作为集合的数据结构)要求您访问每个输入字符串的每个字符两次 - 一次在插入时将其放入特里树中,并在将其添加到与其相等的字符串集中时一次(假设您正在制作深层复制)。因此,这是一个 O(n) 算法。

此外,查找速度非常快 - 要找到等于某个字符串的字符串集,只需遍历 trie 来查找该字符串,查找关联的字符串集,然后迭代它。这需要 O(L + k) 时间,其中 L 是字符串的长度,k 是匹配的数量。

希望这有帮助,如果我误解了问题陈述,请告诉我!

I may be misinterpreting the initial problem setup, but I believe that there is a simple and elegant solution to this problem using off-the-shelf data structures. The idea is, at a high level, to create a map from strings to sets of strings. Each key in the map will be associated with the set of strings that it's equal to. Assuming that each string in a group is mapped to the same set of strings, this can be done time- and space-efficiently.

The algorithm would probably look like this:

  1. Construct a map M from strings to sets of strings.
  2. Group all strings together that are equal to one another (this step depends on how the strings and groups are specified).
  3. For each cluster:
    1. Create a canonical set of the strings in that cluster.
    2. Add each string to the map as a key whose value is the canonical set.

This algorithm and the resulting data structure is quite efficient. Assuming that you already know the clusters in advance, this process (using a trie as the implementation of the map and a simple list as the data structure for the sets) requires you to visit each character of each input string exactly twice - once when inserting it into the trie and once when adding it to the set of strings equal to it, assuming that you're making a deep copy. This is therefore an O(n) algorithm.

Moreover, lookup is quite fast - to find the set of strings equal to some string, just walk the trie to find the string, look up the associated set of strings, then iterate over it. This takes O(L + k) time, where L is the length of the string and k is the number of matches.

Hope this helps, and let me know if I've misinterpreted the problem statement!

无言温柔 2024-12-31 04:07:44

由于这是 Java,我将使用 HashMap>。这会将每个字符串映射到其等价集(其中包含该字符串以及属于同一组的所有其他字符串)。如何根据输入构建等价集取决于您如何定义“等价”。如果输入按组排序(但实际上并未分组),并且如果您实现了谓词来测试等效性,则可以执行如下操作:

boolean differentGroups(String a, String b) {
    // equivalence test (must handle a == null)
}

Map<String, Set<String>> makeMap(ArrayList<String> input) {
    Map<String, Set<String>> map = new HashMap<String, Set<String>>();
    String representative = null;
    Set<String> group;
    for (String next : input) {
        if (differentGroups(representative, next)) {
            representative = next;
            group = new HashSet<String>();
        }
        group.add(next);
        map.put(next, group);
    }
    return map;
}

请注意,只有当组是输入中的连续元素时,这才有效。如果不是,您将需要更复杂的簿记来构建集团结构。

Since this is Java, I would use a HashMap<String, Set<String>>. This would map each string to its equivalence set (which would contain that string and all others that belong to the same group). How you would construct the equivalence sets from the input depends on how you define "equivalent". If the inputs are in order by group (but not actually grouped), and if you had a predicate implemented to test equivalence, you could do something like this:

boolean differentGroups(String a, String b) {
    // equivalence test (must handle a == null)
}

Map<String, Set<String>> makeMap(ArrayList<String> input) {
    Map<String, Set<String>> map = new HashMap<String, Set<String>>();
    String representative = null;
    Set<String> group;
    for (String next : input) {
        if (differentGroups(representative, next)) {
            representative = next;
            group = new HashSet<String>();
        }
        group.add(next);
        map.put(next, group);
    }
    return map;
}

Note that this only works if the groups are contiguous elements in the input. If they aren't you'll need more complex bookkeeping to build the group structure.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文