比较多个子字符串

发布于 2024-09-25 14:48:06 字数 347 浏览 10 评论 0原文

我正在尝试编写一个基本的 DNA 测序仪。其中,给定两个相同长度的序列,它将输出相同的字符串,最小长度为3。 所以输入

abcdef dfeabc

将返回

1 abc

我不知道如何去解决这个问题。 我可以比较两个字符串,看看它们是否完全相等。从那里,我可以比较 length-1 字符串大小,即如果 dfeabc 包含 abcde。但是,我怎样才能让程序处理所有可能的字符串,最小大小为 3 个字符? 一个问题是对于上面的 length-1 示例,我还必须执行字符串 bcdef 并进行比较。

I'm attempting to write a basic dna sequencer. In that, given two sequences of the same length, it will output the strings which are the same, with a minimal length of 3.
So input of

abcdef dfeabc

will return

1 abc

I am not sure how to go about solving the problem.
I can compare the two strings, and see if they are completely equal. From there, i can compare length-1 string size, i.e. if dfeabc contains abcde. However, how can i get the program to do all possible strings, down to a minimal size of 3 characters?
One issue is for the above example of length-1, I'd also have to do the string bcdef and compare that.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

芸娘子的小脾气 2024-10-02 14:48:07

简单的方法是获取字符串 A 的每个子字符串,然后查看它是否在字符串 B 中。

以下是简单的方法:

for ( int i = 0; i < a.length; i++ ) {
   for ( int j = i+1; j <= a.length; j++ ) {
       if (b.contains(a.substring(i, j))) {
           //if longer than last found match, record this match
       }
   }
}

稍微更优化的方法是首先查看较长的子字符串,以便匹配的第一个子字符串必然是最长的。

for ( int length = a.length; length > 0; length-- ) {
     for ( int i = 0; i + length < a.length; i++ ) {
         if ( b.contains(a.substring(i, i+length)) ) {
             //this is the biggest match
         }
     }
}

The naive way would be to get every single substring of string A and see if it's in string B.

Here's the naive way of doing that:

for ( int i = 0; i < a.length; i++ ) {
   for ( int j = i+1; j <= a.length; j++ ) {
       if (b.contains(a.substring(i, j))) {
           //if longer than last found match, record this match
       }
   }
}

The slightly more optimal way would be to look at longer substrings first, so that the first substring that matches is necessarily the longest.

for ( int length = a.length; length > 0; length-- ) {
     for ( int i = 0; i + length < a.length; i++ ) {
         if ( b.contains(a.substring(i, i+length)) ) {
             //this is the biggest match
         }
     }
}
童话 2024-10-02 14:48:07

如果您想以简单的方式解决这个问题,而不使用任何 Java API 进行搜索,您可以这样想:对于第一个字符串中的每对可能的起始索引 i 和第二个字符串中的 j,在两个字符串中都向前推进而第一个和第二个字符串中的对应字符相等,并且如果您执行了至少 3 个步骤,则返回结果。

If you wanted to solve this in a simple way, without using any Java API for searching, you could think of it like this: for every pair of possible starting indices i in the first string and j in the second string, step forward in both while the corresponding characters in the first and the second string are equal, and return a result if you did at least 3 steps.

子栖 2024-10-02 14:48:07

您需要使用最长公共子串算法,这是一个动态规划问题。查看 Wikipedia 的条目以获取算法说明和此处获取示例实现。

You need to use the Longest Common Substring algorithm, a dynamic programming problem. Check Wikipedia's entry for an explanation of the algorithm and here for a sample implementation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文