当前位置：文江博客话题详情

查找字符串列表中最长公共前缀/后缀的出现次数？

发布于 2025-01-13 22:45:58 字数 3886 浏览 4 评论 0 原文

给定一个字符串列表：

ArrayList<String> strList = new ArrayList<String>();
strList.add("Mary had a little lamb named Willy");
strList.add("Mary had a little ham");
strList.add("Old McDonald had a farm named Willy");
strList.add("Willy had a little dog named ham");
strList.add("(abc)");
strList.add("(xyz)");
strList.add("Visit Target Store");
strList.add("Visit Walmart Store");

这应该以 HashMap prefixMap 和 suffixMap 的形式生成输出：

前缀：

Mary had a -> 2
Mary had a little -> 2
( -> 2
Visit -> 2

后缀：

named Willy -> 2
ham -> 2
) -> 2
Store -> 2

到目前为止，我可以使用以下代码生成列表中所有项目中存在的前缀：

public static final int INDEX_NOT_FOUND = -1;

public static String getAllCommonPrefixesInList(final String... strs) {
    if (strs == null || strs.length == 0) {
        return EMPTY_STRING;
    }
    
    
    final int smallestIndexOfDiff = getIndexOfDifference(strs);
    if (smallestIndexOfDiff == INDEX_NOT_FOUND) {
        
        // All Strings are identical
        if (strs[0] == null) {
            return EMPTY_STRING;
        }
        return strs[0];
    } else if (smallestIndexOfDiff == 0) {
        
        
        // No common initial characters found, return empty String
        return EMPTY_STRING;
    } else {
        
        // Common initial character sequence found, return sequence
        return strs[0].substring(0, smallestIndexOfDiff);
    }
}






public static int getIndexOfDifference(final CharSequence... charSequence) {
    if (charSequence == null || charSequence.length <= 1) {
        return INDEX_NOT_FOUND;
    }
    boolean isAnyStringNull = false;
    boolean areAllStringsNull = true;
    
    
    final int arrayLen = charSequence.length;
    int shortestStrLen = Integer.MAX_VALUE;
    int longestStrLen = 0;

    // Find the min and max string lengths - avoids having to check that we are not exceeding the length of the string each time through the bottom loop.
    for (int i = 0; i < arrayLen; i++) {
        if (charSequence[i] == null) {
            isAnyStringNull = true;
            shortestStrLen = 0;
        } else {
            areAllStringsNull = false;
            shortestStrLen = Math.min(charSequence[i].length(), shortestStrLen);
            longestStrLen = Math.max(charSequence[i].length(), longestStrLen);
        }
    }

    // Deals with lists containing all nulls or all empty strings
    
    if (areAllStringsNull || longestStrLen == 0 && !isAnyStringNull) {
        return INDEX_NOT_FOUND;
    }

    // Handle lists containing some nulls or some empty strings
    if (shortestStrLen == 0) {
        return 0;
    }

    // Find the position with the first difference across all strings
    int firstDiff = -1;
    for (int stringPos = 0; stringPos < shortestStrLen; stringPos++) {
        final char comparisonChar = charSequence[0].charAt(stringPos);
        for (int arrayPos = 1; arrayPos < arrayLen; arrayPos++) {
            if (charSequence[arrayPos].charAt(stringPos) != comparisonChar) {
                firstDiff = stringPos;
                break;
            }
        }
        if (firstDiff != -1) {
            break;
        }
    }

    if (firstDiff == -1 && shortestStrLen != longestStrLen) {
        
        // We compared all of the characters up to the length of the
        // shortest string and didn't find a match, but the string lengths
        // vary, so return the length of the shortest string.
        return shortestStrLen;
    }
    return firstDiff;
}

但是，我的目标是包含任何前缀/后缀至少包含2+出现到结果地图中。

如何使用 Java 实现这一点？

原文

Given a list of Strings:

ArrayList<String> strList = new ArrayList<String>();
strList.add("Mary had a little lamb named Willy");
strList.add("Mary had a little ham");
strList.add("Old McDonald had a farm named Willy");
strList.add("Willy had a little dog named ham");
strList.add("(abc)");
strList.add("(xyz)");
strList.add("Visit Target Store");
strList.add("Visit Walmart Store");

This should produce the output in the form of a HashMap<String, Integer> prefixMap and suffixMap:

PREFIX:

Mary had a -> 2
Mary had a little -> 2
( -> 2
Visit -> 2

SUFFIX:

named Willy -> 2
ham -> 2
) -> 2
Store -> 2

So far I'm able to generate a prefix that is present in all items in list using the following code:

public static final int INDEX_NOT_FOUND = -1;

public static String getAllCommonPrefixesInList(final String... strs) {
    if (strs == null || strs.length == 0) {
        return EMPTY_STRING;
    }
    
    
    final int smallestIndexOfDiff = getIndexOfDifference(strs);
    if (smallestIndexOfDiff == INDEX_NOT_FOUND) {
        
        // All Strings are identical
        if (strs[0] == null) {
            return EMPTY_STRING;
        }
        return strs[0];
    } else if (smallestIndexOfDiff == 0) {
        
        
        // No common initial characters found, return empty String
        return EMPTY_STRING;
    } else {
        
        // Common initial character sequence found, return sequence
        return strs[0].substring(0, smallestIndexOfDiff);
    }
}






public static int getIndexOfDifference(final CharSequence... charSequence) {
    if (charSequence == null || charSequence.length <= 1) {
        return INDEX_NOT_FOUND;
    }
    boolean isAnyStringNull = false;
    boolean areAllStringsNull = true;
    
    
    final int arrayLen = charSequence.length;
    int shortestStrLen = Integer.MAX_VALUE;
    int longestStrLen = 0;

    // Find the min and max string lengths - avoids having to check that we are not exceeding the length of the string each time through the bottom loop.
    for (int i = 0; i < arrayLen; i++) {
        if (charSequence[i] == null) {
            isAnyStringNull = true;
            shortestStrLen = 0;
        } else {
            areAllStringsNull = false;
            shortestStrLen = Math.min(charSequence[i].length(), shortestStrLen);
            longestStrLen = Math.max(charSequence[i].length(), longestStrLen);
        }
    }

    // Deals with lists containing all nulls or all empty strings
    
    if (areAllStringsNull || longestStrLen == 0 && !isAnyStringNull) {
        return INDEX_NOT_FOUND;
    }

    // Handle lists containing some nulls or some empty strings
    if (shortestStrLen == 0) {
        return 0;
    }

    // Find the position with the first difference across all strings
    int firstDiff = -1;
    for (int stringPos = 0; stringPos < shortestStrLen; stringPos++) {
        final char comparisonChar = charSequence[0].charAt(stringPos);
        for (int arrayPos = 1; arrayPos < arrayLen; arrayPos++) {
            if (charSequence[arrayPos].charAt(stringPos) != comparisonChar) {
                firstDiff = stringPos;
                break;
            }
        }
        if (firstDiff != -1) {
            break;
        }
    }

    if (firstDiff == -1 && shortestStrLen != longestStrLen) {
        
        // We compared all of the characters up to the length of the
        // shortest string and didn't find a match, but the string lengths
        // vary, so return the length of the shortest string.
        return shortestStrLen;
    }
    return firstDiff;
}

However, my goal is to include any prefix/suffix with at least 2+ occurrences into the resulting map.

How can this be achieved with Java?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

茶底世界 2025-01-20 22:45:58

根据我对这个问题的理解，最适合解决它的数据结构是非循环脱节图。

一般情况下，一个图将由几个不相连的簇组成。每个簇都会有一个树状结构，在边缘情况下它将形成一个链表。

基本上，解决此问题的最简单的天真方法是根据每一行创建一堆链接列表，然后迭代它们。缺点是：节点重复（内存消耗更大）、时间复杂度更高（需要更多操作）并且更容易出错，因为需要更多的手动操作。

图的描述

因此，我将坚持使用图作为此问题的数据结构，并尝试使事情尽可能简单。

让我们考虑以下输入：

"Mary had a little lamb named Willy"
"Mary had a little ham"
"A B C"

图形的图形表示如下所示；

前两行将构成一个由链表（头部）和树组成的簇（尾部）。第二个簇将由一个链表表示，它的顶点不与其他字符串形成的顶点连接。

这不是构造顶点的唯一方式，头部可以生成一个 N 树，并且可以在中间的某个位置观察到链表。

主要要点是，为了解决问题，我们需要通过所有分支跟踪顶点链，直到顶点重叠。在图的这些部分中，两条或多条线中常见的每个前缀字符串和后缀字符串将由一个单个顶点（节点< /em>）。

为了维护映射到特定顶点的字符串数量，每个顶点应该有一个变量（int groupCount 在下面的代码中），当创建顶点时，它被分配默认值1，并且每次新字符串映射到该顶点时都会递增。

每个顶点都包含一个映射，其中保存对其邻居的引用。添加新邻居顶点时，根据给定字符串创建新顶点或现有计数 >顶点增加。

为了符合此任务，图应维护对所有头顶点和尾顶点的引用。为了简单起见，在此解决方案中，图不是在每个顶点中维护两组对相邻节点的引用以及两个单独的计数变量（因为后缀计数和前缀计数会不同） /em> 实际上由两个图（后缀图和前缀图）组成。因此，实现类被命名为 MultiGraph。

为了用顶点填充后缀图和前缀图，方法addCluster()迭代通过Iterator以正常或相反的顺序获取给定行的字符串，具体取决于正在填充的图。

深度优先搜索

填充图形后的下一步是生成频率为 2 及以上的字符串映射。

为此，经典的深度优先搜索算法正在被采用用过的。

为了实现 DFS，需要一个用作堆栈的可变容器（ArrayDeque 正用于此目的）。从头/尾图中取出的第一个元素将被放置在堆栈的顶部，并且StringBuilder的实例保存该元素的名称将被放置在地图中。

然后，要恢复具有特定计数的字符串，将从堆栈顶部弹出顶点及其邻居 em> 与计数 > 1依次将被放置在堆栈顶部。附加了分隔符和邻居名称的当前前缀副本将映射到邻居顶点。

如果计数发生变化，则表明当前前缀表示至少两行之间的最长公共字符串。在这种情况下，前缀和计数将添加到结果映射中。

实现

以下实现由两个类组成，分别是狭义和自包含。 MultiGraph 类专门充当数据结构，维护两个图。管道代码，就像分割字符串一样，被提取到一个单独的类GraphManager中。

Graph

public class MultiGraph {
    private final Map<String, Vertex> heads = new HashMap<>();
    private final Map<String, Vertex> tails = new HashMap<>();

    public void addCluster(Deque<String> names) {
        addCluster(heads, names.iterator());
        addCluster(tails, names.descendingIterator());
    }

    private void addCluster(Map<String, Vertex> clusters, Iterator<String> names) {
        String rootName = names.next();
        if (clusters.containsKey(rootName)) {
            clusters.get(rootName).incrementGroupCount();
        } else {
            clusters.put(rootName, new Vertex(rootName));
        }

        Vertex current = clusters.get(rootName);
        while (names.hasNext()) {
            current = current.addNext(names.next());
        }
    }

    public Map<String, Integer> generatePrefixMap(String delimiter) {
        Map<String, Integer> countByPrefix = new HashMap<>();

        for (Vertex next: heads.values()) {
            if (next.getGroupCount() == 1) {
                continue;
            }
            performDFS(heads, countByPrefix, delimiter, next);
        }
        return countByPrefix;
    }

    public Map<String, Integer> generateSuffixMap(String delimiter) {
        Map<String, Integer> countBySuffix = new HashMap<>();

        for (Vertex next: tails.values()) {
            if (next.getGroupCount() == 1) {
                continue;
            }
            performDFS(tails, countBySuffix, delimiter, next);
        }
        return countBySuffix;
    }
    // implementation of the Depth First Search algorithm
    public void performDFS(Map<String, Vertex> clusters,
                           Map<String, Integer> countByPrefix,
                           String delimiter, Vertex next) {

        StringBuilder prefix = null;
        Vertex current = next;
        int count = next.getGroupCount();

        Deque<Vertex> stack = new ArrayDeque<>(); // create as stack
        Map<Vertex, StringBuilder> prefixByVert = new HashMap<>();
        stack.push(next); // place the first element on the stack
        prefixByVert.put(current, new StringBuilder(current.getName()));

        while (!stack.isEmpty()) {
            current = stack.pop();
            if (current.getGroupCount() < count) { // the number of strings mapped to the current Vertex has been changed
                countByPrefix.put(prefix.toString(), count); // saving the result
                count = current.getGroupCount();
            }
            prefix = prefixByVert.get(current);

            for (Vertex neighbour: current.getNextVertByVal().values()) {
                if (next.getGroupCount() == 1) {
                    continue;
                }
                stack.push(neighbour);
                prefixByVert.put(neighbour, new StringBuilder(prefix)
                                    .append(delimiter)
                                    .append(neighbour.getName()));
            }
        }

        if (prefix != null && count > 1) {
            countByPrefix.putIfAbsent(prefix.toString(), count);
        }
    }

    private static class Vertex {
        private final String name;
        private int groupCount = 1;
        private final Map<String, Vertex> nextVertByVal = new HashMap<>();

        public Vertex(String name) {
            this.name = name;
        }

        public Vertex addNext(String value) {
            if (nextVertByVal.containsKey(value)) {
                nextVertByVal.get(value).incrementGroupCount();
            } else {
                nextVertByVal.put(value, new Vertex(value));
            }
            return nextVertByVal.get(value);
        }

        public void incrementGroupCount() {
            this.groupCount++;
        }

        public String getName() {
            return name;
        }

        public int getGroupCount() {
            return groupCount;
        }

        public Map<String, Vertex> getNextVertByVal() {
            return nextVertByVal;
        }
    }
}

下面的类处理处理输入数据的任务：它分割行，负责丢弃可能出现的空字符串，并将输入打包到 Deque 以方便的方式适应两个方向的迭代。

它还实例化图并管理它的工作。 GraphManager 负责向图表提供分隔符，以便在创建结果地图时恢复字符串的初始形状。这样，您可以在空格上分割给定的行，通过空字符串逐个字符地处理行，或者通过标点符号来分割，而无需更改这两个类中的任何一行。

GraphManager

public class GraphManager {
    private MultiGraph graph = new MultiGraph();
    private String delimiter;

    private GraphManager(String delimiter) {
        this.delimiter = delimiter;
    }

    public static GraphManager getInstance(Iterable<String> lines, String delimiter) {
        GraphManager gm = new GraphManager(delimiter);
        gm.init(lines);
        return gm;
    }

    private void init(Iterable<String> lines) {
        for (String line: lines) {
            Deque<String> names = new ArrayDeque<>();
            for (String name: line.split(delimiter)) {
                if (!name.isEmpty()) {
                    names.add(name);
                }
            }
            addCluster(names);
        }
    }

    private void addCluster(Deque<String> names) {
        graph.addCluster(names);
    }

    public Map<String, Integer> getPrefixMap() {
        return graph.generatePrefixMap(delimiter);
    }

    public Map<String, Integer> getSuffixMap() {
        return graph.generateSuffixMap(delimiter);
    }
}

main()

public static void main(String[] args) {
    List<String> lines = List.of(
            "Mary had a little lamb named Willy", "Mary had a little ham",
            "Old McDonald had a farm named Willy", "Willy had a little dog named ham",
            "( abc )", "( xyz )", "Visit Target Store", "Visit Walmart Store");

    GraphManager gm = GraphManager.getInstance(lines, " ");
    
    System.out.println("Prefixes:");
    for (Map.Entry<String, Integer> entry: gm.getPrefixMap().entrySet()) {
        System.out.println(entry.getValue() + " " + entry.getKey());
    }

    System.out.println("\nSuffixes:");
    for (Map.Entry<String, Integer> entry: gm.getSuffixMap().entrySet()) {
        System.out.println(entry.getValue() + " " + entry.getKey());
    }
}

输出

Prefixes:
2 Mary had a little
2 Visit
2 (

Suffixes:
2 ham
2 )
2 Store
2 Willy named

In my understanding of this problem the most suitable data structure for solving it is an acyclic disjointed Graph.

In general case, a graph will be comprised of several unconnected clusters. Each cluster will have a tree-like structure, in the edge case it'll form a linked list.

Basically, the most simple naive approach on how to solve this problem is to create a bunch of linked list based on each line, and iterate over them. The drawbacks are: duplication of nodes (greater memory consumption), greater time-complexity (more operations required) and it's more error-prone because more manual actions are needed.

The description of the Graph

So I'll stick with the graph as the data structure for this problem and try to keep things as simple as possible.

Let's consider the following input:

"Mary had a little lamb named Willy"
"Mary had a little ham"
"A B C"

The graphical representation of the graph will look like this;

The two first lines will constitute a cluster formed from a linked list (the head part) and a tree (the tail part). The second cluster will be represented by a linked list, its vertices aren't connected with vertices formed from other strings.

It's not the only way the vertexes can be structured, the head could spawn an N-tree and a linked list could be observed somewhere in the middle.

The main takeaway is that in order to solve the problem, we need to track the chain of vertexes through all the branches until the vertexes overlap. In these parts of the graph, every prefix-strings and suffix-string that is common among two or more lines will be represented by a single vertex (node).

To maintain the number of strings that are mapped to a particular vertex, each vertex should have a variable (int groupCount in the code below), which is assigned with a default value of 1 when a vertex is being created and incremented each time a new string gets mapped to this vertex.

Each vertex contains a map that holds references to its neighbours. When a new neighbour-vertex is being added, either new Vertex in being created based on the given string or the count of an existing vertex gets incremented.

In order to conform to this task, the graph should maintain references to all head-vertexes and tail-vertexes. For simplicity, instead of maintaining two groups of references to adjacent nodes, and two separate count variables (because suffix-count and prefix-count will differ) in each vertex, in this solution graph is actually comprised of two graph (suffix-graph and prefix-graph). And for that reason, the implementing class in named MultiGraph.

In order to populate both suffix-graph and prefix-graph with vertexes, method addCluster() iterates over the string of the given line by the means of Iterator in normal or reversed order, depending on which graph is being populated.

Depth first search

The next step after the graphs are populated is to generate the maps of strings with the frequency of 2 and greater.

For that, the classical depth first search algorithm is being used.

In order to implement the DFS, a mutable container that will be used as a stack is required (ArrayDeque is being used for that purpose). The first element that is taken from the map of heads/tails will be placed on the top of the stack and an instance of StringBuilder holding the name of this element will be placed in the map.

Then, to restore a string with a particular count, vertexes will be popped from the top of the stack and their neighbours with count > 1 in turn will be placed on top of the stack. A copy of the current prefix with the delimiter and the neighbour's name appended will get mapped to the neighbour-vertex.

If a count changes, that indicates that the current prefix represents the longest common string between at least two lines. In this case, prefix and count are being added to the resulting map.

Implementation

The following implementation consists of two classes that are narrow-focused and self-contained. The MultiGraph class acts exclusively as data structure, maintaining two graphs. The pluming code, like splitting the lines of strings, is extracted into a separate class GraphManager.

Graph

public class MultiGraph {
    private final Map<String, Vertex> heads = new HashMap<>();
    private final Map<String, Vertex> tails = new HashMap<>();

    public void addCluster(Deque<String> names) {
        addCluster(heads, names.iterator());
        addCluster(tails, names.descendingIterator());
    }

    private void addCluster(Map<String, Vertex> clusters, Iterator<String> names) {
        String rootName = names.next();
        if (clusters.containsKey(rootName)) {
            clusters.get(rootName).incrementGroupCount();
        } else {
            clusters.put(rootName, new Vertex(rootName));
        }

        Vertex current = clusters.get(rootName);
        while (names.hasNext()) {
            current = current.addNext(names.next());
        }
    }

    public Map<String, Integer> generatePrefixMap(String delimiter) {
        Map<String, Integer> countByPrefix = new HashMap<>();

        for (Vertex next: heads.values()) {
            if (next.getGroupCount() == 1) {
                continue;
            }
            performDFS(heads, countByPrefix, delimiter, next);
        }
        return countByPrefix;
    }

    public Map<String, Integer> generateSuffixMap(String delimiter) {
        Map<String, Integer> countBySuffix = new HashMap<>();

        for (Vertex next: tails.values()) {
            if (next.getGroupCount() == 1) {
                continue;
            }
            performDFS(tails, countBySuffix, delimiter, next);
        }
        return countBySuffix;
    }
    // implementation of the Depth First Search algorithm
    public void performDFS(Map<String, Vertex> clusters,
                           Map<String, Integer> countByPrefix,
                           String delimiter, Vertex next) {

        StringBuilder prefix = null;
        Vertex current = next;
        int count = next.getGroupCount();

        Deque<Vertex> stack = new ArrayDeque<>(); // create as stack
        Map<Vertex, StringBuilder> prefixByVert = new HashMap<>();
        stack.push(next); // place the first element on the stack
        prefixByVert.put(current, new StringBuilder(current.getName()));

        while (!stack.isEmpty()) {
            current = stack.pop();
            if (current.getGroupCount() < count) { // the number of strings mapped to the current Vertex has been changed
                countByPrefix.put(prefix.toString(), count); // saving the result
                count = current.getGroupCount();
            }
            prefix = prefixByVert.get(current);

            for (Vertex neighbour: current.getNextVertByVal().values()) {
                if (next.getGroupCount() == 1) {
                    continue;
                }
                stack.push(neighbour);
                prefixByVert.put(neighbour, new StringBuilder(prefix)
                                    .append(delimiter)
                                    .append(neighbour.getName()));
            }
        }

        if (prefix != null && count > 1) {
            countByPrefix.putIfAbsent(prefix.toString(), count);
        }
    }

    private static class Vertex {
        private final String name;
        private int groupCount = 1;
        private final Map<String, Vertex> nextVertByVal = new HashMap<>();

        public Vertex(String name) {
            this.name = name;
        }

        public Vertex addNext(String value) {
            if (nextVertByVal.containsKey(value)) {
                nextVertByVal.get(value).incrementGroupCount();
            } else {
                nextVertByVal.put(value, new Vertex(value));
            }
            return nextVertByVal.get(value);
        }

        public void incrementGroupCount() {
            this.groupCount++;
        }

        public String getName() {
            return name;
        }

        public int getGroupCount() {
            return groupCount;
        }

        public Map<String, Vertex> getNextVertByVal() {
            return nextVertByVal;
        }
    }
}

The following class deals with the task of processing the input data: it splits the lines, takes care of discarding the empty string which might take place, and packs the input into a Deque to accommodate the iteration in both directions in a convenient way.

It also instantiates the graph and governs it's work. GraphManager takes care of providing the delimiter to the graph in order to restore the initial shape of strings while the resulting maps are being created. With that you can split the given lines on a white space, by empty string to process lines character by character or by punctuation marks without changing a single line in these two classes.

GraphManager

public class GraphManager {
    private MultiGraph graph = new MultiGraph();
    private String delimiter;

    private GraphManager(String delimiter) {
        this.delimiter = delimiter;
    }

    public static GraphManager getInstance(Iterable<String> lines, String delimiter) {
        GraphManager gm = new GraphManager(delimiter);
        gm.init(lines);
        return gm;
    }

    private void init(Iterable<String> lines) {
        for (String line: lines) {
            Deque<String> names = new ArrayDeque<>();
            for (String name: line.split(delimiter)) {
                if (!name.isEmpty()) {
                    names.add(name);
                }
            }
            addCluster(names);
        }
    }

    private void addCluster(Deque<String> names) {
        graph.addCluster(names);
    }

    public Map<String, Integer> getPrefixMap() {
        return graph.generatePrefixMap(delimiter);
    }

    public Map<String, Integer> getSuffixMap() {
        return graph.generateSuffixMap(delimiter);
    }
}

main()

public static void main(String[] args) {
    List<String> lines = List.of(
            "Mary had a little lamb named Willy", "Mary had a little ham",
            "Old McDonald had a farm named Willy", "Willy had a little dog named ham",
            "( abc )", "( xyz )", "Visit Target Store", "Visit Walmart Store");

    GraphManager gm = GraphManager.getInstance(lines, " ");
    
    System.out.println("Prefixes:");
    for (Map.Entry<String, Integer> entry: gm.getPrefixMap().entrySet()) {
        System.out.println(entry.getValue() + " " + entry.getKey());
    }

    System.out.println("\nSuffixes:");
    for (Map.Entry<String, Integer> entry: gm.getSuffixMap().entrySet()) {
        System.out.println(entry.getValue() + " " + entry.getKey());
    }
}

Output

Prefixes:
2 Mary had a little
2 Visit
2 (

Suffixes:
2 ham
2 )
2 Store
2 Willy named

回复收藏 0 原文

云雾 2025-01-20 22:45:58

这个问题应该可以使用特里。

trie 节点基本上应该跟踪两件事：

子节点
以当前节点结尾的前缀计数

将所有字符串插入到 trie 中，这将在 O（字符串长度 * 字符串数量） 内完成。之后，只需遍历 trie，您就可以根据您的用例根据计数对前缀进行哈希处理。对于后缀，您可以使用相同的方法，只需开始以相反的顺序遍历字符串即可。

编辑：
再想一想，trie 可能是最有效的方法，但简单的 hashmap 实现也应该在这里工作。下面是生成所有包含 count > 的前缀的示例。 1.

import java.util.*;
import java.util.stream.*;

class Main {
  public static void main(String[] args) {
    
    System.out.println("Hello world!");

    ArrayList<String> strList = new ArrayList<String>();
    
    strList.add("Mary had a little lamb named Willy");
    strList.add("Mary had a little ham");
    strList.add("Old McDonald had a farm named Willy");
    strList.add("Willy had a little dog named ham");
    strList.add("(abc)");
    strList.add("(xyz)");
    strList.add("Visit Target Store");
    strList.add("Visit Walmart Store");

    Map<String, Integer> prefixMap = new HashMap<String, Integer>();
    ArrayList<String> stringsWithHighestOccurrence = new ArrayList<String>();

    for (String word : strList) {
            for (int i = 1; i <= word.length(); i++){
        String prefix = word.substring(0, i);
        prefixMap.merge(prefix, 1, Integer::sum);
      }
        }

    Integer maxval = Collections.max(prefixMap.values());

    for (String key: prefixMap.keySet()){
      Integer value = prefixMap.get(key);
      if (value > 1) System.out.println(key + " : " + value);
      if (value == maxval) stringsWithHighestOccurrence.add(key);
    }

    int maxLength = stringsWithHighestOccurrence.stream().map(String::length).max(Integer::compareTo).get();
    
    System.out.println(maxLength);

    ArrayList<String> prefixesWithMaxLength =
stringsWithHighestOccurrence.stream().filter(c -> c.length() == maxLength).collect(Collectors.toCollection(ArrayList::new));
    System.out.println(prefixesWithMaxLength);
  }
}

尽管如此，为了完成，我还将添加一个基本的 TrieNode 实现，因为我的答案首先提出了这种方法。

TrieNode:

class TrieNode {
    private final Map<Character, TrieNode> children = new HashMap<>();
    private int count;

    Map<Character, TrieNode> getChildren() {
        return children;
    }

    boolean getCount() {
        return count;
    }

    void increaseCount() {
        this.count += 1;
    }
}

Trie：

class Trie {
    private TrieNode root;

    Trie() {
        root = new TrieNode();
    }

    void insert(String word) {
        TrieNode current = root;

        for (char l : word.toCharArray()) {
            current = current.getChildren().computeIfAbsent(l, c -> new TrieNode());
            current.increaseCount()
        }
    }
}

遍历 trie 类似于简单的 DFS 场景，其中还维护到当前节点的“路径”（我们使用前缀切换路径）到目前为止）。

This problem should be solved easily using a trie.

The trie node should basically keep a track of 2 things:

Child nodes
Count of prefixes ending at current node

Insert all strings in the trie, which will be done in O(string length * number of strings). After that, simply traversing the trie, you can hash the prefixes based on the count as per your use case. For suffixes, you can use the same approach, just start traversing the strings in reverse order.

Edit:
On second thought, trie might be the most efficient way, but a simple hashmap implementation should also work here. Here's an example to generate all prefixes with count > 1.

import java.util.*;
import java.util.stream.*;

class Main {
  public static void main(String[] args) {
    
    System.out.println("Hello world!");

    ArrayList<String> strList = new ArrayList<String>();
    
    strList.add("Mary had a little lamb named Willy");
    strList.add("Mary had a little ham");
    strList.add("Old McDonald had a farm named Willy");
    strList.add("Willy had a little dog named ham");
    strList.add("(abc)");
    strList.add("(xyz)");
    strList.add("Visit Target Store");
    strList.add("Visit Walmart Store");

    Map<String, Integer> prefixMap = new HashMap<String, Integer>();
    ArrayList<String> stringsWithHighestOccurrence = new ArrayList<String>();

    for (String word : strList) {
            for (int i = 1; i <= word.length(); i++){
        String prefix = word.substring(0, i);
        prefixMap.merge(prefix, 1, Integer::sum);
      }
        }

    Integer maxval = Collections.max(prefixMap.values());

    for (String key: prefixMap.keySet()){
      Integer value = prefixMap.get(key);
      if (value > 1) System.out.println(key + " : " + value);
      if (value == maxval) stringsWithHighestOccurrence.add(key);
    }

    int maxLength = stringsWithHighestOccurrence.stream().map(String::length).max(Integer::compareTo).get();
    
    System.out.println(maxLength);

    ArrayList<String> prefixesWithMaxLength =
stringsWithHighestOccurrence.stream().filter(c -> c.length() == maxLength).collect(Collectors.toCollection(ArrayList::new));
    System.out.println(prefixesWithMaxLength);
  }
}

Nonetheless, I'll also add a basic TrieNode implementation for the sake of completion, since my answer proposed that approach in the first place.

TrieNode:

class TrieNode {
    private final Map<Character, TrieNode> children = new HashMap<>();
    private int count;

    Map<Character, TrieNode> getChildren() {
        return children;
    }

    boolean getCount() {
        return count;
    }

    void increaseCount() {
        this.count += 1;
    }
}

Trie:

class Trie {
    private TrieNode root;

    Trie() {
        root = new TrieNode();
    }

    void insert(String word) {
        TrieNode current = root;

        for (char l : word.toCharArray()) {
            current = current.getChildren().computeIfAbsent(l, c -> new TrieNode());
            current.increaseCount()
        }
    }
}

Traversing the trie would be analogous to a simple DFS scenario where the "path" upto current node is also maintained (we're switching the path with the prefix upto this point).

回复收藏 0 原文

审判长 2025-01-20 22:45:58

要实现 trie，首先需要一个节点。

class Node<T> {

    private final T value;
    private final Node<T> parent;
    private final Map<T, Node<T>> children;
    private boolean isEnd;

    Node(T value, Node<T> parent) {
        this.value = value;
        this.parent = parent;
        this.children = new HashMap<>();
        this.isEnd = false;
    }

    Node<T> addChild(T childValue, Node<T> parent) {
        //return child node if existing, otherwise create and return
       return this.children.computeIfAbsent(childValue, value -> new Node<>(value, parent));
    }

    T getValue() {
        return this.value;
    }

    Node<T> getParent() {
        return this.parent;
    }

    boolean isEnd() {
        return this.isEnd;
    }

    void setEnd(boolean isEnd) {
        this.isEnd = isEnd;
    }

    Collection<Node<T>> children() {
        return this.children.values();
    }

    @Override
    public String toString() {
        //for easier debugging
        return "Node{" +
                "value=" + this.value +
                ", children=" + this.children.keySet() +
                ", isEnd=" + this.isEnd +
                '}';
    }
}

在节点中，我们有实际值、对父节点的引用，以便更容易构建前缀、将值映射到相应节点的子节点，以及节点是否为结束节点的标志。

实际的 Trie 实现

public class Trie<T> {

    private final Node<T> root;

    public Trie() {
        this.root = new Node<>(null, null);
    }

    public void insert(T[] elements) {
        if (elements.length == 0) {
            //don't want to set root as end node
            return;
        }
        Node<T> currentNode = this.root;
        for (T element : elements) {
            currentNode = currentNode.addChild(element, currentNode);
        }
        currentNode.setEnd(true);
    }

    public Map<Collection<T>, Integer> countPrefixes(BiConsumer<Deque<T>, T> operation) {
        Map<Collection<T>, Integer> map = new LinkedHashMap<>();
        this.countPrefixes(this.root, map, operation);
        return map;
    }

    private void countPrefixes(Node<T> current, Map<Collection<T>, Integer> map, BiConsumer<Deque<T>, T> operation) {
        if (current != this.root) {
            //check java doc for AbstractSet hashCode and equals
            ArrayList<T> prefix = this.buildKey(current, operation);
            int childrenCount = current.children().size();
            if (childrenCount == 0 && current.isEnd()) {
                //this sets entire collection(entire sentence 'Mary had a little lamb named Willy' for example)
                //as prefix which is met once
                childrenCount = 1;
            }
            map.merge(prefix, childrenCount, Integer::sum);
            if (childrenCount > 1) {
                //each parent node is already marked as prefix met once,
                //but any node having more than one child means entire chain of nodes
                //is a prefix met the number of children,
                //so we go backwards to update parent counts with the difference
                this.updateParentPrefixes(current.getParent(), childrenCount - 1, map, operation);
            }
        }
        for (Node<T> child : current.children()) {
            //visit each child recursively to count them
            //depth first
            this.countPrefixes(child, map, operation);
        }
    }

    //operation is abstraction for the order
    //in which we want to add elements
    //when building key/prefix
    private ArrayList<T> buildKey(Node<T> node, BiConsumer<Deque<T>, T> operation) {
        Deque<T> prefix = new LinkedList<>();
        while (node.getValue() != null) {
            operation.accept(prefix, node.getValue());
            node = node.getParent();
        }
        return new ArrayList<>(prefix);
    }

    private void updateParentPrefixes(Node<T> parent, int updateCount, Map<Collection<T>, Integer> map, BiConsumer<Deque<T>, T> operation) {
        if (parent == this.root) {
            //we don't want to update root, ever!!!
            return;
        }
        ArrayList<T> prefix = this.buildKey(parent, operation);
        map.merge(prefix, updateCount, Integer::sum);
        //visit parents recursively until root
        this.updateParentPrefixes(parent.getParent(), updateCount, map, operation);
    }
}

从某种意义上说，它有所简化，仅实现了 insert ，但我们现在不需要更新和删除。需要注意的几点：

Trie 必须有一个根节点，该节点为空。我们用 null 值和 null 父级来表示。我们绝不能永远更新此节点。
参数BiConsumer, T> countPrefixes() 的操作。这是我们在创建前缀时添加元素的顺序的抽象。这是必需的，因为在反转集合/句子时，计数后缀可以表示并实现为计数前缀。
countPrefixes() 的返回类型 Map, Integer>。这种实现更加通用，因此每个前缀都表示为节点值的集合。
为什么前缀是ArrayList？首先我们需要保持插入顺序。其次，它的 hashCode() 和 equals() 实现使其成为映射键的良好候选者。引用 javadoc：

This确保 list1.equals(list2) 意味着任意两个列表 list1 和 list2 的 list1.hashCode()==list2.hashCode()，正如 Object.hashCode 的一般契约所要求的。 - for哈希码。

换句话说，如果两个列表包含相同顺序的相同元素，则它们被定义为相等。 - 对于 equals。

代码中的注释应该解释实现中的其余细节。

主要要测试。

public class TrieMain {

    public static void main(String[] args) {
        String spacePattern = "\\s+";
        String emptyStringPattern = "";

        ArrayList<String> strList = new ArrayList<>();
        strList.add("Mary had a little lamb named Willy");
        strList.add("Mary had a little ham");
        strList.add("Mary had a sandwich");
        strList.add("Old McDonald had a farm named Willy");
        strList.add("Willy had a little dog named ham");
        strList.add("Willy had a big dog named Willy");
        strList.add("Willy had a huge dog named Willy");
        strList.add("(abc)");
        strList.add("(xyz)");
        strList.add("Visit Target Store");
        strList.add("Visit Walmart Store");

        Trie<String> trie = new Trie<>();
        //using another trie for suffix to avoid counting suffix as prefix and vice versa
        Trie<String> reversedTrie = new Trie<>();
        for (String string : strList) {
            String pattern = string.startsWith("(") ? emptyStringPattern : spacePattern;
            String[] words = string.split(pattern);
            trie.insert(words);

            //reverse collection to count suffixes
            Collections.reverse(Arrays.asList(words));
            reversedTrie.insert(words);
        }
        Map<Collection<String>, Integer> prefixCount = trie.countPrefixes(Deque::addFirst);
        System.out.println("Prefixes:");
        printMap(prefixCount);
        System.out.println();
        Map<Collection<String>, Integer> suffixCount = reversedTrie.countPrefixes(Deque::addLast);
        System.out.println("Suffixes:");
        printMap(suffixCount);
    }

    private static void printMap(Map<Collection<String>, Integer> map) {
        map.entrySet()
                .stream()
                .filter(entry -> entry.getValue() > 1)
                .forEach(e -> {
                    String key = String.join(" ", e.getKey());
                    String line = String.format("%s -> %d", key, e.getValue());
                    System.out.println(line);
                });
    }
}

这里需要注意的是：

在列表中添加了一些句子以进行额外的测试。
我将使用 ( 的输入拆分为字符而不是单词作为其余部分来实现输出，但示例有点令人困惑。对于句子前缀是单词的集合，而对于这两个单词，它们是字符。
需要第二个 trie 用于后缀 - 否则我们可能将前缀算作后缀，反之亦然。
Arrays.asList() 返回的列表由输入数组支持。反转输入数组本身编辑

：在使用字符进行测试时AbstractSet，或者任何精确的集合，对于前缀来说都是一个坏主意，因为它不允许重复的元素。更新了 trie 实现以使用 ArrayList。我也添加还有 2 个主要方法，用于演示 trie 处理字符串和字符的方法：

字符串 - 每个节点都是一个单词，通过按空格分割字符串获得

public class TrieWords {

    public static void main(String[] args) {
        ArrayList<String> strList = new ArrayList<>();
        strList.add("Mary had a little lamb named Willy");
        strList.add("Mary had a little ham");
        strList.add("Mary had a sandwich");
        strList.add("Old McDonald had a farm named Willy");
        strList.add("Willy had a little dog named ham");
        strList.add("Willy had a big dog named Willy");
        strList.add("Willy had a huge dog named Willy");
        strList.add("(abc)");
        strList.add("(xyz)");
        strList.add("Visit Target Store");
        strList.add("Visit Walmart Store");

        Trie<String> trie = new Trie<>();
        //using another trie for suffix to avoid counting suffix as prefix and vice versa
        Trie<String> reversedTrie = new Trie<>();
        for (String string : strList) {
            String[] words = string.split("\\s+");
            trie.insert(words);

            //reverse collection to count suffixes
            Collections.reverse(Arrays.asList(words));
            reversedTrie.insert(words);
        }
        Map<Collection<String>, Integer> prefixCount = trie.countPrefixes(Deque::addFirst);
        System.out.println("Prefixes:");
        printMap(prefixCount);
        System.out.println();
        Map<Collection<String>, Integer> suffixCount = reversedTrie.countPrefixes(Deque::addLast);
        System.out.println("Suffixes:");
        printMap(suffixCount);
    }

    private static void printMap(Map<Collection<String>, Integer> map) {
        map.entrySet()
                .stream()
                .filter(entry -> entry.getValue() > 1)
                .forEach(e -> {
                    String key = String.join(" ", e.getKey());
                    String line = String.format("%s -> %d", key, e.getValue());
                    System.out.println(line);
                });
    }
}

字符 - 每个节点都是一个字符串中的字符

public class TrieChars {

    public static void main(String[] args) {
        ArrayList<String> strList = new ArrayList<>();
        strList.add("Mary had a little lamb named Willy");
        strList.add("Mary had a little ham");
        strList.add("Mary had a sandwich");
        strList.add("Old McDonald had a farm named Willy");
        strList.add("Willy had a little dog named ham");
        strList.add("Willy had a big dog named Willy");
        strList.add("Willy had a huge dog named Willy");
        strList.add("(abcd)");
        strList.add("(xyz)");
        strList.add("Visit Target Store");
        strList.add("Visit Walmart Store");

        Trie<Character> trie = new Trie<>();
        //using another trie for suffix to avoid counting suffix as prefix and vice versa
        Trie<Character> reversedTrie = new Trie<>();
        for (String string : strList) {
            int length = string.length();
            Character[] words = new Character[length];
            Character[] reversed = new Character[length];
            for (int i = 0; i < length; i++) {
                int reversedIndex = length - 1 - i;
                words[i] = string.charAt(i);
                reversed[reversedIndex] = string.charAt(i);
            }
            trie.insert(words);
            reversedTrie.insert(reversed);
        }
        Map<Collection<Character>, Integer> prefixCount = trie.countPrefixes(Deque::addFirst);
        System.out.println("Prefixes:");
        printMap(prefixCount);
        System.out.println();
        Map<Collection<Character>, Integer> suffixCount = reversedTrie.countPrefixes(Deque::addLast);
        System.out.println("Suffixes:");
        printMap(suffixCount);
    }

    private static void printMap(Map<Collection<Character>, Integer> map) {
        map.entrySet()
                .stream()
                .filter(entry -> entry.getValue() > 1)
                .forEach(e -> {
                    String key = e.getKey().stream().map(Object::toString).collect(Collectors.joining(""));
                    String line = String.format("%s -> %d", key, e.getValue());
                    System.out.println(line);
                });
    }
}

To implement a trie, first you need a node.

class Node<T> {

    private final T value;
    private final Node<T> parent;
    private final Map<T, Node<T>> children;
    private boolean isEnd;

    Node(T value, Node<T> parent) {
        this.value = value;
        this.parent = parent;
        this.children = new HashMap<>();
        this.isEnd = false;
    }

    Node<T> addChild(T childValue, Node<T> parent) {
        //return child node if existing, otherwise create and return
       return this.children.computeIfAbsent(childValue, value -> new Node<>(value, parent));
    }

    T getValue() {
        return this.value;
    }

    Node<T> getParent() {
        return this.parent;
    }

    boolean isEnd() {
        return this.isEnd;
    }

    void setEnd(boolean isEnd) {
        this.isEnd = isEnd;
    }

    Collection<Node<T>> children() {
        return this.children.values();
    }

    @Override
    public String toString() {
        //for easier debugging
        return "Node{" +
                "value=" + this.value +
                ", children=" + this.children.keySet() +
                ", isEnd=" + this.isEnd +
                '}';
    }
}

In the node we have actual value, reference to the parent node, to make it eaiser to build prefixes, child nodes mapped value to corresponding node, and a flag if node is end node.

Actual Trie implementation

public class Trie<T> {

    private final Node<T> root;

    public Trie() {
        this.root = new Node<>(null, null);
    }

    public void insert(T[] elements) {
        if (elements.length == 0) {
            //don't want to set root as end node
            return;
        }
        Node<T> currentNode = this.root;
        for (T element : elements) {
            currentNode = currentNode.addChild(element, currentNode);
        }
        currentNode.setEnd(true);
    }

    public Map<Collection<T>, Integer> countPrefixes(BiConsumer<Deque<T>, T> operation) {
        Map<Collection<T>, Integer> map = new LinkedHashMap<>();
        this.countPrefixes(this.root, map, operation);
        return map;
    }

    private void countPrefixes(Node<T> current, Map<Collection<T>, Integer> map, BiConsumer<Deque<T>, T> operation) {
        if (current != this.root) {
            //check java doc for AbstractSet hashCode and equals
            ArrayList<T> prefix = this.buildKey(current, operation);
            int childrenCount = current.children().size();
            if (childrenCount == 0 && current.isEnd()) {
                //this sets entire collection(entire sentence 'Mary had a little lamb named Willy' for example)
                //as prefix which is met once
                childrenCount = 1;
            }
            map.merge(prefix, childrenCount, Integer::sum);
            if (childrenCount > 1) {
                //each parent node is already marked as prefix met once,
                //but any node having more than one child means entire chain of nodes
                //is a prefix met the number of children,
                //so we go backwards to update parent counts with the difference
                this.updateParentPrefixes(current.getParent(), childrenCount - 1, map, operation);
            }
        }
        for (Node<T> child : current.children()) {
            //visit each child recursively to count them
            //depth first
            this.countPrefixes(child, map, operation);
        }
    }

    //operation is abstraction for the order
    //in which we want to add elements
    //when building key/prefix
    private ArrayList<T> buildKey(Node<T> node, BiConsumer<Deque<T>, T> operation) {
        Deque<T> prefix = new LinkedList<>();
        while (node.getValue() != null) {
            operation.accept(prefix, node.getValue());
            node = node.getParent();
        }
        return new ArrayList<>(prefix);
    }

    private void updateParentPrefixes(Node<T> parent, int updateCount, Map<Collection<T>, Integer> map, BiConsumer<Deque<T>, T> operation) {
        if (parent == this.root) {
            //we don't want to update root, ever!!!
            return;
        }
        ArrayList<T> prefix = this.buildKey(parent, operation);
        map.merge(prefix, updateCount, Integer::sum);
        //visit parents recursively until root
        this.updateParentPrefixes(parent.getParent(), updateCount, map, operation);
    }
}

It is somewhat simplified in the sense, that only insert is implemented, but we don't need update and delete now. Few things to note:

Trie must have a root node, which is empty. We are representing that with null value and null parent. We must never update this node.
Parameter BiConsumer<Deque<T>, T> operation of countPrefixes(). This is abstraction for the order in which we want to add elements when creating a prefix. It's needed, because counting suffixes can be represented, and is implemented, as counting prefixes when reversing the collection/sentence.
Return type Map<Collection<T>, Integer> of countPrefixes(). This implementation is more generic, thus each prefix is represented as collection of node values.
Why prefix is ArrayList? First we need to keep insertion order. Second its' implementation of hashCode() and equals() makes it good candidate for map key. To citate javadoc:

This ensures that list1.equals(list2) implies that list1.hashCode()==list2.hashCode() for any two lists, list1 and list2, as required by the general contract of Object.hashCode. - for hashCode.

In other words, two lists are defined to be equal if they contain the same elements in the same order. - for equals.

The comments in code should explain the rest of the specifics in implementation.

A main to test.

public class TrieMain {

    public static void main(String[] args) {
        String spacePattern = "\\s+";
        String emptyStringPattern = "";

        ArrayList<String> strList = new ArrayList<>();
        strList.add("Mary had a little lamb named Willy");
        strList.add("Mary had a little ham");
        strList.add("Mary had a sandwich");
        strList.add("Old McDonald had a farm named Willy");
        strList.add("Willy had a little dog named ham");
        strList.add("Willy had a big dog named Willy");
        strList.add("Willy had a huge dog named Willy");
        strList.add("(abc)");
        strList.add("(xyz)");
        strList.add("Visit Target Store");
        strList.add("Visit Walmart Store");

        Trie<String> trie = new Trie<>();
        //using another trie for suffix to avoid counting suffix as prefix and vice versa
        Trie<String> reversedTrie = new Trie<>();
        for (String string : strList) {
            String pattern = string.startsWith("(") ? emptyStringPattern : spacePattern;
            String[] words = string.split(pattern);
            trie.insert(words);

            //reverse collection to count suffixes
            Collections.reverse(Arrays.asList(words));
            reversedTrie.insert(words);
        }
        Map<Collection<String>, Integer> prefixCount = trie.countPrefixes(Deque::addFirst);
        System.out.println("Prefixes:");
        printMap(prefixCount);
        System.out.println();
        Map<Collection<String>, Integer> suffixCount = reversedTrie.countPrefixes(Deque::addLast);
        System.out.println("Suffixes:");
        printMap(suffixCount);
    }

    private static void printMap(Map<Collection<String>, Integer> map) {
        map.entrySet()
                .stream()
                .filter(entry -> entry.getValue() > 1)
                .forEach(e -> {
                    String key = String.join(" ", e.getKey());
                    String line = String.format("%s -> %d", key, e.getValue());
                    System.out.println(line);
                });
    }
}

Things to note here:

Added few more sentences in the list for extra testing.
I am splitting the inputs with ( into characters and not words as the rest to achieve your output, but example is somewhat confusing. For sentences prefixes are collection of words, while for those two they are characters.
Need second trie for suffixes - otherwise we might count prefix as suffix and vice versa.
Arrays.asList() returned list is backed by input array. I am taking advantage of that to reverse the input array itself for suffixes.

EDIT: While testing with characters realized AbstractSet, or any set to be precise, is a bad idea for prefix, because it does not allow duplicated elements. Updated trie implementation to use ArrayList. I'm also adding 2 more main methods, to demonstrate the trie working with strings and with characters:

Strings - each node is a word, acquired by splitting strings by space

public class TrieWords {

    public static void main(String[] args) {
        ArrayList<String> strList = new ArrayList<>();
        strList.add("Mary had a little lamb named Willy");
        strList.add("Mary had a little ham");
        strList.add("Mary had a sandwich");
        strList.add("Old McDonald had a farm named Willy");
        strList.add("Willy had a little dog named ham");
        strList.add("Willy had a big dog named Willy");
        strList.add("Willy had a huge dog named Willy");
        strList.add("(abc)");
        strList.add("(xyz)");
        strList.add("Visit Target Store");
        strList.add("Visit Walmart Store");

        Trie<String> trie = new Trie<>();
        //using another trie for suffix to avoid counting suffix as prefix and vice versa
        Trie<String> reversedTrie = new Trie<>();
        for (String string : strList) {
            String[] words = string.split("\\s+");
            trie.insert(words);

            //reverse collection to count suffixes
            Collections.reverse(Arrays.asList(words));
            reversedTrie.insert(words);
        }
        Map<Collection<String>, Integer> prefixCount = trie.countPrefixes(Deque::addFirst);
        System.out.println("Prefixes:");
        printMap(prefixCount);
        System.out.println();
        Map<Collection<String>, Integer> suffixCount = reversedTrie.countPrefixes(Deque::addLast);
        System.out.println("Suffixes:");
        printMap(suffixCount);
    }

    private static void printMap(Map<Collection<String>, Integer> map) {
        map.entrySet()
                .stream()
                .filter(entry -> entry.getValue() > 1)
                .forEach(e -> {
                    String key = String.join(" ", e.getKey());
                    String line = String.format("%s -> %d", key, e.getValue());
                    System.out.println(line);
                });
    }
}

Characters - each node is a character from the string

public class TrieChars {

    public static void main(String[] args) {
        ArrayList<String> strList = new ArrayList<>();
        strList.add("Mary had a little lamb named Willy");
        strList.add("Mary had a little ham");
        strList.add("Mary had a sandwich");
        strList.add("Old McDonald had a farm named Willy");
        strList.add("Willy had a little dog named ham");
        strList.add("Willy had a big dog named Willy");
        strList.add("Willy had a huge dog named Willy");
        strList.add("(abcd)");
        strList.add("(xyz)");
        strList.add("Visit Target Store");
        strList.add("Visit Walmart Store");

        Trie<Character> trie = new Trie<>();
        //using another trie for suffix to avoid counting suffix as prefix and vice versa
        Trie<Character> reversedTrie = new Trie<>();
        for (String string : strList) {
            int length = string.length();
            Character[] words = new Character[length];
            Character[] reversed = new Character[length];
            for (int i = 0; i < length; i++) {
                int reversedIndex = length - 1 - i;
                words[i] = string.charAt(i);
                reversed[reversedIndex] = string.charAt(i);
            }
            trie.insert(words);
            reversedTrie.insert(reversed);
        }
        Map<Collection<Character>, Integer> prefixCount = trie.countPrefixes(Deque::addFirst);
        System.out.println("Prefixes:");
        printMap(prefixCount);
        System.out.println();
        Map<Collection<Character>, Integer> suffixCount = reversedTrie.countPrefixes(Deque::addLast);
        System.out.println("Suffixes:");
        printMap(suffixCount);
    }

    private static void printMap(Map<Collection<Character>, Integer> map) {
        map.entrySet()
                .stream()
                .filter(entry -> entry.getValue() > 1)
                .forEach(e -> {
                    String key = e.getKey().stream().map(Object::toString).collect(Collectors.joining(""));
                    String line = String.format("%s -> %d", key, e.getValue());
                    System.out.println(line);
                });
    }
}

回复收藏 0 原文

聽兲甴掵 2025-01-20 22:45:58

我认为@Abhinav 提供的解决方案应该使用 HashMap 来工作。在这里，我将使用 java 中的简单 Trie 实现发布解决方案（进行一些自定义，例如将 freq 添加到 Trie 节点）。

    ArrayList<String> strList = new ArrayList<String>();
    strList.add("Mary had a little lamb named Willy");
    strList.add("Mary had a little ham");
    strList.add("Old McDonald had a farm named Willy");
    strList.add("Willy had a little dog named ham");
    strList.add("( abc )");
    strList.add("( xyz )");
    strList.add("Visit Target Store");
    strList.add("Visit Walmart Store");

    TNode root = new TNode("");
    int maxFreq = 1;
    for(String sentence : strList) {
        TNode currentNode = root;
        String[] words = sentence.split(" "); // Assuming space character is the delimiter 
        for(String word: words) {
            if(currentNode.children.containsKey(word)) {
                currentNode.children.get(word).freq += 1;
                maxFreq = Math.max(maxFreq, currentNode.children.get(word).freq);
            } else {
                TNode c = new TNode(word);
                c.freq = 1;
                currentNode.children.put(word, c);
            }
            currentNode = currentNode.children.get(word);
        }
    }

    Map<String, Integer> result = new HashMap<String, Integer>();
    Queue<NodeWithPrefix> queue = new LinkedList<NodeWithPrefix>();
    for(TNode node : root.children.values()){
        NodeWithPrefix nwp = new NodeWithPrefix(node);
        nwp.prefix = "";
        queue.add(nwp);
    }
    while(!queue.isEmpty()) {
        NodeWithPrefix item = queue.poll();
        if(item.node.freq == maxFreq) {
            result.put(item.prefix + " " + item.node.value, item.node.freq);
        }
        for(TNode child : item.node.children.values()) {
            NodeWithPrefix nwp = new NodeWithPrefix(child);
            nwp.prefix = item.prefix + " " + item.node.value;
            queue.add(nwp);
        }
    }
    return result;

以下是该算法所需的其他 2 个类：

class NodeWithPrefix {
    String prefix;
    TNode node;
    public NodeWithPrefix(TNode node){
        this.node = node;
    }
}

class TNode {
    String value;
    int freq = 0;
    Map<String, TNode> children;
    public TNode(String value){
        this.value = value;
        children = new HashMap<String, TNode>();
    }
}

输出是前缀：（对于后缀应该类似，只需向后构建 Trie）

{ Mary had=2,  Mary had a=2,  Visit=2,  (=2,  Mary had a little=2,  Mary=2}

这里我使用 BFS 来检索 Trie 中频率等于 maxFreq 的所有子字符串。我们可以根据需要调整过滤条件。你也可以在这里做DFS。
其他考虑因素是我们可以将前缀添加到 TNode 类本身中，我更喜欢将其单独放在另一个类中。

I think the solution provided by @Abhinav should work using HashMap. Here I will post the solution using a simple Trie implementation in java (with some customization, such as adding freq into the Trie Node).

    ArrayList<String> strList = new ArrayList<String>();
    strList.add("Mary had a little lamb named Willy");
    strList.add("Mary had a little ham");
    strList.add("Old McDonald had a farm named Willy");
    strList.add("Willy had a little dog named ham");
    strList.add("( abc )");
    strList.add("( xyz )");
    strList.add("Visit Target Store");
    strList.add("Visit Walmart Store");

    TNode root = new TNode("");
    int maxFreq = 1;
    for(String sentence : strList) {
        TNode currentNode = root;
        String[] words = sentence.split(" "); // Assuming space character is the delimiter 
        for(String word: words) {
            if(currentNode.children.containsKey(word)) {
                currentNode.children.get(word).freq += 1;
                maxFreq = Math.max(maxFreq, currentNode.children.get(word).freq);
            } else {
                TNode c = new TNode(word);
                c.freq = 1;
                currentNode.children.put(word, c);
            }
            currentNode = currentNode.children.get(word);
        }
    }

    Map<String, Integer> result = new HashMap<String, Integer>();
    Queue<NodeWithPrefix> queue = new LinkedList<NodeWithPrefix>();
    for(TNode node : root.children.values()){
        NodeWithPrefix nwp = new NodeWithPrefix(node);
        nwp.prefix = "";
        queue.add(nwp);
    }
    while(!queue.isEmpty()) {
        NodeWithPrefix item = queue.poll();
        if(item.node.freq == maxFreq) {
            result.put(item.prefix + " " + item.node.value, item.node.freq);
        }
        for(TNode child : item.node.children.values()) {
            NodeWithPrefix nwp = new NodeWithPrefix(child);
            nwp.prefix = item.prefix + " " + item.node.value;
            queue.add(nwp);
        }
    }
    return result;

Here are 2 other classes required for this algorithm:

class NodeWithPrefix {
    String prefix;
    TNode node;
    public NodeWithPrefix(TNode node){
        this.node = node;
    }
}

class TNode {
    String value;
    int freq = 0;
    Map<String, TNode> children;
    public TNode(String value){
        this.value = value;
        children = new HashMap<String, TNode>();
    }
}

The output is for prefix: (for postfix should be similar, just need to build the Trie backward)

{ Mary had=2,  Mary had a=2,  Visit=2,  (=2,  Mary had a little=2,  Mary=2}

Here I am using a BFS to retrieve all sub strings having frequency equal maxFreq in the Trie. We can adjust the filter condition based on the need. You can do the DFS here also.
Other consideration is we can add prefix into the TNode class itself, I prefer to keep it separate in another class.

回复收藏 0 原文

~没有更多了~

关于作者

佞臣

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

查找字符串列表中最长公共前缀/后缀的出现次数？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

图的描述

深度优先搜索

实现

The description of the Graph

Depth first search

Implementation

关于作者

相关话题

热门标签

推荐作者

佚名

今天

゛时过境迁

达拉崩吧

呆萌少年

孤者何惧

友情链接

查找字符串列表中最长公共前缀/后缀的出现次数？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

图的描述

深度优先搜索

实现

The description of the Graph

Depth first search

Implementation

关于作者

相关话题

热门标签

推荐作者

佚名

今天

゛时过境迁

达拉崩吧

呆萌少年

孤者何惧

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。