比较多个子字符串
我正在尝试编写一个基本的 DNA 测序仪。其中,给定两个相同长度的序列,它将输出相同的字符串,最小长度为3。 所以输入
abcdef dfeabc
将返回
1 abc
我不知道如何去解决这个问题。 我可以比较两个字符串,看看它们是否完全相等。从那里,我可以比较 length-1 字符串大小,即如果 dfeabc
包含 abcde
。但是,我怎样才能让程序处理所有可能的字符串,最小大小为 3 个字符? 一个问题是对于上面的 length-1 示例,我还必须执行字符串 bcdef 并进行比较。
I'm attempting to write a basic dna sequencer. In that, given two sequences of the same length, it will output the strings which are the same, with a minimal length of 3.
So input of
abcdef dfeabc
will return
1 abc
I am not sure how to go about solving the problem.
I can compare the two strings, and see if they are completely equal. From there, i can compare length-1 string size, i.e. if dfeabc
contains abcde
. However, how can i get the program to do all possible strings, down to a minimal size of 3 characters?
One issue is for the above example of length-1, I'd also have to do the string bcdef and compare that.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
简单的方法是获取字符串 A 的每个子字符串,然后查看它是否在字符串 B 中。
以下是简单的方法:
稍微更优化的方法是首先查看较长的子字符串,以便匹配的第一个子字符串必然是最长的。
The naive way would be to get every single substring of string A and see if it's in string B.
Here's the naive way of doing that:
The slightly more optimal way would be to look at longer substrings first, so that the first substring that matches is necessarily the longest.
如果您想以简单的方式解决这个问题,而不使用任何 Java API 进行搜索,您可以这样想:对于第一个字符串中的每对可能的起始索引 i 和第二个字符串中的 j,在两个字符串中都向前推进而第一个和第二个字符串中的对应字符相等,并且如果您执行了至少 3 个步骤,则返回结果。
If you wanted to solve this in a simple way, without using any Java API for searching, you could think of it like this: for every pair of possible starting indices i in the first string and j in the second string, step forward in both while the corresponding characters in the first and the second string are equal, and return a result if you did at least 3 steps.
您需要使用最长公共子串算法,这是一个动态规划问题。查看 Wikipedia 的条目以获取算法说明和此处获取示例实现。
You need to use the Longest Common Substring algorithm, a dynamic programming problem. Check Wikipedia's entry for an explanation of the algorithm and here for a sample implementation.