C++ - 输出给定长度的所有可能的DNA kmers
“kmer”是长度为 K 的 DNA 序列。有效的 DNA 序列(就我的目的而言)只能包含以下 4 个碱基:A、C、T、G。我正在寻找一种 C++ 算法,它可以简单地将这些基数的所有可能组合按字母顺序输出到字符串数组中。例如,如果 K = 2,则程序应生成以下数组:
kmers[0] = AA
kmers[1] = AC
kmers[2] = AG
kmers[3] = AT
kmers[4] = CA
kmers[5] = CC
kmers[6] = CG
kmers[7] = CT
kmers[8] = GA
kmers[9] = GC
kmers[10] = GG
kmers[11] = GT
kmers[12] = TA
kmers[13] = TC
kmers[14] = TG
kmers[15] = TT
如果我正确地考虑了这一点,则问题实际上可以分解为将十进制整数转换为基数 4,然后替换适当的基数。我以为我可以使用 itoa 来实现这一点,但 itoa 不是 C 标准,而且我的编译器不支持它。我欢迎任何聪明的想法。这是我的示例代码:
#include <iostream>
#include <string>
#include <math.h>
#define K 3
using namespace std;
int main() {
int num_kmers = pow(4,K);
string* kmers = NULL;
/* Allocate memory for kmers array */
kmers = new string[num_kmers];
/* Populate kmers array */
for (int i=0; i< pow(4,K); i++) {
// POPULATE THE kmers ARRAY HERE
}
/* Display all possible kmers */
for (int i=0; i< pow(4,K); i++)
cout << kmers[i] << "\n";
delete [] kmers;
}
A "kmer" is a DNA sequence of length K. A valid DNA sequence (for my purposes) can only contain the 4 following bases: A,C,T,G. I am looking for a C++ algorithm that simply outputs all possible combinations of these bases in alphabetical order into a string array. For example, if K = 2, The program should generate the following array:
kmers[0] = AA
kmers[1] = AC
kmers[2] = AG
kmers[3] = AT
kmers[4] = CA
kmers[5] = CC
kmers[6] = CG
kmers[7] = CT
kmers[8] = GA
kmers[9] = GC
kmers[10] = GG
kmers[11] = GT
kmers[12] = TA
kmers[13] = TC
kmers[14] = TG
kmers[15] = TT
If I'm thinking about this correctly, the problem really breaks down to converting a decimal integer to base 4 then substituting the appropriate bases. I thought I could use itoa for this, but itoa is not C standard, and my compiler did not support it. I welcome any clever ideas. Here is my sample code:
#include <iostream>
#include <string>
#include <math.h>
#define K 3
using namespace std;
int main() {
int num_kmers = pow(4,K);
string* kmers = NULL;
/* Allocate memory for kmers array */
kmers = new string[num_kmers];
/* Populate kmers array */
for (int i=0; i< pow(4,K); i++) {
// POPULATE THE kmers ARRAY HERE
}
/* Display all possible kmers */
for (int i=0; i< pow(4,K); i++)
cout << kmers[i] << "\n";
delete [] kmers;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您需要使用递归来保持灵活性(即,以便您可以轻松更改 K)。
然后这样称呼它:
干杯。
You would need to use recursion to be flexible (i.e. so that you could change K easily).
And then call it like this:
Cheers.
一旦用户输入文本被接受,您不需要将十进制转换为任何内容。
创建字符串数组也可能是一个错误,该数组随着
K
呈指数增长。只需打印输出即可。You don't need to convert decimal to anything, once the user input text has been accepted.
It's also probably a mistake to create an array of strings, which grows exponentially with
K
. Just print the output.在我看来,这非常适合自定义迭代器。这样你的主程序就可以很简单:
但是,由于我们已经将 kmer 概念实现为迭代器,所以我们也可以使用所有其他通用算法。由于我们将 kmer 迭代器实现为随机访问迭代器,因此找到第 i 个 kmer 很简单:
这是我的完整程序:
This seems to me to be a perfect fit for a custom iterator. That way your main program can be simple:
But, since we have implemented the kmer concept as an interator, we get to use all of the other generic algorithms, too. Since we implemented the kmer iterator as a Random Access Iterator, then finding the i-th kmer is trivial:
Here is my complete program:
哇,这不应该那么难。
Whoa, this shouldn't be so hard.