Java字符串模式识别

发布于 2024-09-18 08:22:42 字数 357 浏览 7 评论 0原文

我有一个大约一千个字符长的字符串,由 L、T 和 A 组成。我很确定其中有一个简单的模式,我想知道是否有任何快速简便的方法可以找到它。该字符串会发生变化,因此这不仅仅是一次性的。

我正在寻找的模式是,例如,如果字符串是

LLLLLLLLAATAALLLLALLLLLLAATAALLLATLLLLLAATAALLAALLLLLAATAALL

子字符串LLLLLAATAALL在此字符串中重复4次。我想搜索这样的子字符串,但我不知道它们在哪里开始、结束、有多少个以及它们在主字符串中的长度。

如果 Java 中有任何工具可以查找此类内容,我们将不胜感激。

恩特

I have a string that is about a thousand characters long composed of L's, T's, and A's. I'm pretty sure there is a simple pattern in it and I'm wondering if there is any quick and easy way of going about finding it. This string changes so that this is not just a one off.

The pattern I'm looking for is for example if the string was

LLLLLLLLAATAALLLLALLLLLLAATAALLLATLLLLLAATAALLAALLLLLAATAALL

The substring LLLLLAATAALL repeats 4 times in this string. I want to search for substrings like this but I don't know where they start, end, how many there are, and how long they are in the main string.

If there are any tools in Java for looking for this kind of thing any advice would be much appreciated.

nt

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

淡水深流 2024-09-25 08:22:42

好的,所以我从 此处 获取代码并对其进行调整以跟踪并打印最长重复子串。只需使用 JUnit 运行它即可。

/* Copyright (c) 2010 the authors listed at the following URL, and/or
 the authors of referenced articles or incorporated external code:
 http://en.literateprograms.org/Suffix_tree_(Java)?action=history&offset=20100123220137

 Permission is hereby granted, free of charge, to any person obtaining
 a copy of this software and associated documentation files (the
 "Software"), to deal in the Software without restriction, including
 without limitation the rights to use, copy, modify, merge, publish,
 distribute, sublicense, and/or sell copies of the Software, and to
 permit persons to whom the Software is furnished to do so, subject to
 the following conditions:

 The above copyright notice and this permission notice shall be
 included in all copies or substantial portions of the Software.

 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
 IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
 CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
 TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

 Retrieved from: http://en.literateprograms.org/Suffix_tree_(Java)?oldid=16641
 */

import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import org.junit.Test;

public class SuffixTree {
    @Test
    public void sampleUsage() {

        AbstractSuffixTree tree = new SimpleSuffixTree(
                "LLLLLLLLAATAALLLLALLLLLLAATAALLLATLLLLLAATAALLAALLLLLAATAALL");
        System.out.println("Longest repeating substring "
                + tree.best.printResult() + " repetitions=" + tree.best.visits
                + " length=" + tree.best.stringDepth);
    }
}

abstract class AbstractSuffixTree {

    SuffixTreeNode best;
    String text = null;
    SuffixTreeNode root = null;
    int inputAlphabetSize = -1;

    AbstractSuffixTree(String text) {
        if (text.length() > 0 && text.charAt(text.length() - 1) == '

输出:

最长重复子串
LLLLLLAATAALLL 重复次数=2 长度=14

编辑:回复您的评论。

目前,我只是跟踪最长的重复 SuffixTreeNode(它是 AbstractSuffixTree 中的一个字段)。您可以修改它,以便它跟踪节点的 SortedQueue,按其 stringDepth 排序。

) { this.text = text; } else { this.text = text + "$"; } } } class SimpleSuffixTree extends AbstractSuffixTree { public SimpleSuffixTree(String text) { super(text); constructTree(); } private void constructTree() { super.root = new SuffixTreeNode(this); best = root; char[] s = super.text.toCharArray(); for (int i = 0; i < s.length; i++) { List<String> suffixList = new ArrayList<String>(); for (int k = i; k < s.length; k++) { suffixList.add(s[k] + ""); } super.root.addSuffix(suffixList, i + 1); } } } class CompactSuffixTree extends AbstractSuffixTree { public CompactSuffixTree(SimpleSuffixTree simpleSuffixTree) { super(simpleSuffixTree.text); super.root = compactNodes(simpleSuffixTree.root, 0); super.best = simpleSuffixTree.best; } private SuffixTreeNode compactNodes(SuffixTreeNode node, int nodeDepth) { node.nodeDepth = nodeDepth; for (SuffixTreeNode child : node.children) { while (child.children.size() == 1) { SuffixTreeNode grandchild = child.children.iterator().next(); child.incomingEdge.label += ", " + grandchild.incomingEdge.label; child.stringDepth += grandchild.incomingEdge.label.length(); child.children = grandchild.children; // for (SuffixTreeNode grandchild : child.children) grandchild.parent = node; } child = compactNodes(child, nodeDepth + 1); } return node; } } class SuffixTreeNode { AbstractSuffixTree tree; SuffixTreeEdge incomingEdge = null; int nodeDepth = -1; int label = -1; Collection<SuffixTreeNode> children = null; SuffixTreeNode parent = null; int stringDepth; int id = 0; public static int c; public int visits = 1; public SuffixTreeNode(AbstractSuffixTree tree, SuffixTreeNode parent, String incomingLabel, int depth, int label, int id) { children = new ArrayList<SuffixTreeNode>(); incomingEdge = new SuffixTreeEdge(incomingLabel, label); nodeDepth = depth; this.label = label; this.parent = parent; stringDepth = parent.stringDepth + incomingLabel.length(); this.id = id; this.tree = tree; } public SuffixTreeNode(AbstractSuffixTree tree) { children = new ArrayList<SuffixTreeNode>(); nodeDepth = 0; label = 0; this.tree = tree; } public void addSuffix(List<String> suffix, int pathIndex) { SuffixTreeNode insertAt = this; insertAt = search(this, suffix); insert(insertAt, suffix, pathIndex); } private SuffixTreeNode search(SuffixTreeNode startNode, List<String> suffix) { if (suffix.isEmpty()) { throw new IllegalArgumentException( "Empty suffix. Probably no valid simple suffix tree exists for the input."); } Collection<SuffixTreeNode> children = startNode.children; for (SuffixTreeNode child : children) { if (child.incomingEdge.label.equals(suffix.get(0))) { suffix.remove(0); child.visits++; if (child.visits > 1 && child.stringDepth > tree.best.stringDepth) { tree.best = child; } if (suffix.isEmpty()) { return child; } return search(child, suffix); } } return startNode; } private void insert(SuffixTreeNode insertAt, List<String> suffix, int pathIndex) { for (String x : suffix) { SuffixTreeNode child = new SuffixTreeNode(tree, insertAt, x, insertAt.nodeDepth + 1, pathIndex, id); insertAt.children.add(child); insertAt = child; } } public String toString() { StringBuilder result = new StringBuilder(); String incomingLabel = this.isRoot() ? "" : this.incomingEdge.label; for (int i = 1; i <= this.nodeDepth; i++) result.append("\t"); if (this.isRoot()) { c = 1; this.id = 1; } else { this.id = c; result.append(this.parent.id + " -> "); result.append(this.id + "[label=\"" + incomingLabel + "\"]" + "(" + visits + "," + (stringDepth) + ")" + ";\n"); } for (SuffixTreeNode child : children) { c++; child.id = c; result.append(child.toString()); } return result.toString(); } public String printResult() { if (parent == null) { return ""; } else { return this.parent.printResult() + this.incomingEdge.label; } } public boolean isRoot() { return this.parent == null; } public boolean isLeaf() { return children.size() == 0; } } class SuffixTreeEdge { String label = null; @SuppressWarnings("unused") private int branchIndex = -1; public SuffixTreeEdge(String label, int branchIndex) { this.label = label; this.branchIndex = branchIndex; } }

输出:

最长重复子串
LLLLLLAATAALLL 重复次数=2 长度=14

编辑:回复您的评论。

目前,我只是跟踪最长的重复 SuffixTreeNode(它是 AbstractSuffixTree 中的一个字段)。您可以修改它,以便它跟踪节点的 SortedQueue,按其 stringDepth 排序。

Ok, so I took the code from here and adapted it to keep track of and print the longest repeated substring. Just run it using JUnit.

/* Copyright (c) 2010 the authors listed at the following URL, and/or
 the authors of referenced articles or incorporated external code:
 http://en.literateprograms.org/Suffix_tree_(Java)?action=history&offset=20100123220137

 Permission is hereby granted, free of charge, to any person obtaining
 a copy of this software and associated documentation files (the
 "Software"), to deal in the Software without restriction, including
 without limitation the rights to use, copy, modify, merge, publish,
 distribute, sublicense, and/or sell copies of the Software, and to
 permit persons to whom the Software is furnished to do so, subject to
 the following conditions:

 The above copyright notice and this permission notice shall be
 included in all copies or substantial portions of the Software.

 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
 IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
 CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
 TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

 Retrieved from: http://en.literateprograms.org/Suffix_tree_(Java)?oldid=16641
 */

import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import org.junit.Test;

public class SuffixTree {
    @Test
    public void sampleUsage() {

        AbstractSuffixTree tree = new SimpleSuffixTree(
                "LLLLLLLLAATAALLLLALLLLLLAATAALLLATLLLLLAATAALLAALLLLLAATAALL");
        System.out.println("Longest repeating substring "
                + tree.best.printResult() + " repetitions=" + tree.best.visits
                + " length=" + tree.best.stringDepth);
    }
}

abstract class AbstractSuffixTree {

    SuffixTreeNode best;
    String text = null;
    SuffixTreeNode root = null;
    int inputAlphabetSize = -1;

    AbstractSuffixTree(String text) {
        if (text.length() > 0 && text.charAt(text.length() - 1) == '

Output:

Longest repeating substring
LLLLLLAATAALLL repetitions=2 length=14

EDIT: response to your comments.

Currently I simply keep track of the longest repeating SuffixTreeNode (it's a field in AbstractSuffixTree). You could modify this so it keeps track of a SortedQueue of nodes, ordered by their stringDepth.

) { this.text = text; } else { this.text = text + "$"; } } } class SimpleSuffixTree extends AbstractSuffixTree { public SimpleSuffixTree(String text) { super(text); constructTree(); } private void constructTree() { super.root = new SuffixTreeNode(this); best = root; char[] s = super.text.toCharArray(); for (int i = 0; i < s.length; i++) { List<String> suffixList = new ArrayList<String>(); for (int k = i; k < s.length; k++) { suffixList.add(s[k] + ""); } super.root.addSuffix(suffixList, i + 1); } } } class CompactSuffixTree extends AbstractSuffixTree { public CompactSuffixTree(SimpleSuffixTree simpleSuffixTree) { super(simpleSuffixTree.text); super.root = compactNodes(simpleSuffixTree.root, 0); super.best = simpleSuffixTree.best; } private SuffixTreeNode compactNodes(SuffixTreeNode node, int nodeDepth) { node.nodeDepth = nodeDepth; for (SuffixTreeNode child : node.children) { while (child.children.size() == 1) { SuffixTreeNode grandchild = child.children.iterator().next(); child.incomingEdge.label += ", " + grandchild.incomingEdge.label; child.stringDepth += grandchild.incomingEdge.label.length(); child.children = grandchild.children; // for (SuffixTreeNode grandchild : child.children) grandchild.parent = node; } child = compactNodes(child, nodeDepth + 1); } return node; } } class SuffixTreeNode { AbstractSuffixTree tree; SuffixTreeEdge incomingEdge = null; int nodeDepth = -1; int label = -1; Collection<SuffixTreeNode> children = null; SuffixTreeNode parent = null; int stringDepth; int id = 0; public static int c; public int visits = 1; public SuffixTreeNode(AbstractSuffixTree tree, SuffixTreeNode parent, String incomingLabel, int depth, int label, int id) { children = new ArrayList<SuffixTreeNode>(); incomingEdge = new SuffixTreeEdge(incomingLabel, label); nodeDepth = depth; this.label = label; this.parent = parent; stringDepth = parent.stringDepth + incomingLabel.length(); this.id = id; this.tree = tree; } public SuffixTreeNode(AbstractSuffixTree tree) { children = new ArrayList<SuffixTreeNode>(); nodeDepth = 0; label = 0; this.tree = tree; } public void addSuffix(List<String> suffix, int pathIndex) { SuffixTreeNode insertAt = this; insertAt = search(this, suffix); insert(insertAt, suffix, pathIndex); } private SuffixTreeNode search(SuffixTreeNode startNode, List<String> suffix) { if (suffix.isEmpty()) { throw new IllegalArgumentException( "Empty suffix. Probably no valid simple suffix tree exists for the input."); } Collection<SuffixTreeNode> children = startNode.children; for (SuffixTreeNode child : children) { if (child.incomingEdge.label.equals(suffix.get(0))) { suffix.remove(0); child.visits++; if (child.visits > 1 && child.stringDepth > tree.best.stringDepth) { tree.best = child; } if (suffix.isEmpty()) { return child; } return search(child, suffix); } } return startNode; } private void insert(SuffixTreeNode insertAt, List<String> suffix, int pathIndex) { for (String x : suffix) { SuffixTreeNode child = new SuffixTreeNode(tree, insertAt, x, insertAt.nodeDepth + 1, pathIndex, id); insertAt.children.add(child); insertAt = child; } } public String toString() { StringBuilder result = new StringBuilder(); String incomingLabel = this.isRoot() ? "" : this.incomingEdge.label; for (int i = 1; i <= this.nodeDepth; i++) result.append("\t"); if (this.isRoot()) { c = 1; this.id = 1; } else { this.id = c; result.append(this.parent.id + " -> "); result.append(this.id + "[label=\"" + incomingLabel + "\"]" + "(" + visits + "," + (stringDepth) + ")" + ";\n"); } for (SuffixTreeNode child : children) { c++; child.id = c; result.append(child.toString()); } return result.toString(); } public String printResult() { if (parent == null) { return ""; } else { return this.parent.printResult() + this.incomingEdge.label; } } public boolean isRoot() { return this.parent == null; } public boolean isLeaf() { return children.size() == 0; } } class SuffixTreeEdge { String label = null; @SuppressWarnings("unused") private int branchIndex = -1; public SuffixTreeEdge(String label, int branchIndex) { this.label = label; this.branchIndex = branchIndex; } }

Output:

Longest repeating substring
LLLLLLAATAALLL repetitions=2 length=14

EDIT: response to your comments.

Currently I simply keep track of the longest repeating SuffixTreeNode (it's a field in AbstractSuffixTree). You could modify this so it keeps track of a SortedQueue of nodes, ordered by their stringDepth.

伪心 2024-09-25 08:22:42

您将从以下代码中了解如何对字符串“hi”进行计数;

公共 int countHi(String str) {

int count = 0,i = str.length() - 1;
字符串目标=“嗨”;

for(int j = 0;j < i;j++)
{
if(str.substring(j,j+2).equals(目标))
计数++;}
返回计数;
}

You will get idea about how to do it from the following code to count the String "hi";

public int countHi(String str) {

int count = 0,i = str.length() - 1;
String goal = "hi";

for(int j = 0;j < i;j++)
{
if(str.substring(j,j+2).equals(goal))
count++;}
return count;
}

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文