DNA序列的计算

发布于 2024-08-10 22:00:09 字数 49 浏览 5 评论 0 原文

你能告诉我如何使用 Java 使用 Levenshtein 算法计算 DNA 序列吗

Could you tell me how I can calculate the DNA sequences by Java using Levenshtein algorithm

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

请止步禁区 2024-08-17 22:00:09

由于您没有将其标记为作业,因此我认为没有必要自己写这个。 Apache 的 StringUtils 有它

Since you did not tag it as homework, I see no need in writing this yourself. Apache's StringUtils has it.

猫性小仙女 2024-08-17 22:00:09

以下是关于编辑距离的维基百科页面的算法:(

 int LevenshteinDistance(char s[1..m], char t[1..n])
 {
   // d is a table with m+1 rows and n+1 columns
   declare int d[0..m, 0..n]

   for i from 0 to m
     d[i, 0] := i // deletion
   for j from 0 to n
     d[0, j] := j // insertion

   for j from 1 to n
   {
     for i from 1 to m
     {
       if s[i] = t[j] then 
         d[i, j] := d[i-1, j-1]
       else
         d[i, j] := minimum
                    (
                      d[i-1, j] + 1,  // deletion
                      d[i, j-1] + 1,  // insertion
                      d[i-1, j-1] + 1 // substitution
                    )
     }
   }

   return d[m, n]
 }

我相信你可以)

将你的两个 DNA 序列作为 st 传递,它将以 int 形式返回距离。

Here is the algorithm from the Wikipedia page on Levenshtein distances:

 int LevenshteinDistance(char s[1..m], char t[1..n])
 {
   // d is a table with m+1 rows and n+1 columns
   declare int d[0..m, 0..n]

   for i from 0 to m
     d[i, 0] := i // deletion
   for j from 0 to n
     d[0, j] := j // insertion

   for j from 1 to n
   {
     for i from 1 to m
     {
       if s[i] = t[j] then 
         d[i, j] := d[i-1, j-1]
       else
         d[i, j] := minimum
                    (
                      d[i-1, j] + 1,  // deletion
                      d[i, j-1] + 1,  // insertion
                      d[i-1, j-1] + 1 // substitution
                    )
     }
   }

   return d[m, n]
 }

(I'm sure you can make java out of that with a little work.)

pass in your two DNA sequences as s and t and it will return the distance as an int.

只有影子陪我不离不弃 2024-08-17 22:00:09

我相信这就是你所追求的。如果您愿意,可以删除 System.out.println 语句。请注意,如果您保留它们,则打印内容中将省略第一行和第一列。

已根据维基百科页面上的结果进行验证。

public int getLevenshteinDistance(String a, String b)
{
    // d is a table with m+1 rows and n+1 columns
    char[] s = (a).toCharArray();
    char[] t = (b).toCharArray();
    System.out.println(a + " - " + b);
    int m = s.length;
    int n = t.length;
    int[][] d = new int[m + 1][n + 1];

    int i;
    int j;
    for(i = 0; i < (m + 1); i++)
    {
        d[i][0] = i; //deletion
    }

    for(j = 0; j < (n + 1); j++)
    {
        d[0][j] = j; //insertion
    }

    for (j = 1; j < (n + 1); j++)
    {
        for (i = 1; i < (m + 1); i++)
        {
            if (s[i-1] == t[j-1])
            {
                d[i][j] = d[i-1][j-1];
            }
            else
            {
                d[i][j] = Math.min((d[i-1][j] + 1), //deletion
                        (Math.min((d[i][j-1] + 1), //insertion
                        (d[i-1][j-1] + 1)))); //substitution
            }
            System.out.print(" [" + d[i][j] + "]");
        }
        System.out.println("");
    }

    return d[m][n];
}

测试:

    String a = "Saturday";
    String b = "Sunday";
    int d = getLevenshteinDistance(a, b);
    System.out.println(d);
    a = "kitten";
    b = "sitting";
    d = getLevenshteinDistance(a, b);
    System.out.println(d);

I believe this is what you're after. You can remove the System.out.println statements if you like. Note that if you leave them in, that the first row and columns are omitted from what is printed.

Verified against the results on the wikipedia page.

public int getLevenshteinDistance(String a, String b)
{
    // d is a table with m+1 rows and n+1 columns
    char[] s = (a).toCharArray();
    char[] t = (b).toCharArray();
    System.out.println(a + " - " + b);
    int m = s.length;
    int n = t.length;
    int[][] d = new int[m + 1][n + 1];

    int i;
    int j;
    for(i = 0; i < (m + 1); i++)
    {
        d[i][0] = i; //deletion
    }

    for(j = 0; j < (n + 1); j++)
    {
        d[0][j] = j; //insertion
    }

    for (j = 1; j < (n + 1); j++)
    {
        for (i = 1; i < (m + 1); i++)
        {
            if (s[i-1] == t[j-1])
            {
                d[i][j] = d[i-1][j-1];
            }
            else
            {
                d[i][j] = Math.min((d[i-1][j] + 1), //deletion
                        (Math.min((d[i][j-1] + 1), //insertion
                        (d[i-1][j-1] + 1)))); //substitution
            }
            System.out.print(" [" + d[i][j] + "]");
        }
        System.out.println("");
    }

    return d[m][n];
}

To test:

    String a = "Saturday";
    String b = "Sunday";
    int d = getLevenshteinDistance(a, b);
    System.out.println(d);
    a = "kitten";
    b = "sitting";
    d = getLevenshteinDistance(a, b);
    System.out.println(d);
坠似风落 2024-08-17 22:00:09

Levenshtein 的 wiki 包含算法和结果矩阵的解释。只需将算法实现为方法并返回矩阵中的最后一个元素即可。

The wiki for Levenshtein contains an algorithm and an explanation of the resulting matrix. Simply implement the algorithm as a method and return the last element in the matrix.

花开半夏魅人心 2024-08-17 22:00:09

复制/粘贴 Levenshtein Distance Algorithm 中的函数并像使用它一样所以:

 String a = "AAAAAAAAAAAAAAAAAA";
 String b = "AAAAAAAAACTAAAAAAA";

 int d = getLevenshteinDistance(a,b);
 System.out.println(d);

Copy/Paste the function from the Levenshtein Distance Algorithm and use it like so:

 String a = "AAAAAAAAAAAAAAAAAA";
 String b = "AAAAAAAAACTAAAAAAA";

 int d = getLevenshteinDistance(a,b);
 System.out.println(d);
御守 2024-08-17 22:00:09

如果您只是对计算两个 DNA 序列之间的变异感兴趣,您应该使用 Damerau –编辑距离不是常规的编辑距离。

维基百科条目包含一些示例代码,您肯定能够将其映射到 java 代码。

If you are just interested in calculating the variation between two DNA sequences you should use the Damerau–Levenshtein distance not the regular Levenshtein distance.

The wikipedia entry contains some sample code which you surely are able to map to java code.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文