验证 C/C++ 中的 DNA

发布于 2024-10-29 05:33:40 字数 264 浏览 0 评论 0原文

我正在迭代 DNA 序列，一次将 5-15 个碱基的块拉出到 C++ std::string 对象中。有时，我的字符串会包含非 ATCG 碱基，我想在发生这种情况时采取行动。例如，我可能会看到：

CTACGGTACGRCTA

因为有一个“R”，所以我想识别这种情况。我熟悉正则表达式，但人们似乎推荐了几个不同的库。我见过 Boost、TR1 等。有人可以建议一种不同的方式来捕获我的案例，或者告诉我应该使用哪个库以及为什么？

谢谢

原文

I am iterating over DNA sequences pulling out chunks of 5-15 bases at a time into C++ std::string objects. Occasionally, my string will contain a non ATCG base, and I want to take an action when this happens. For example, I might see:

CTACGGTACGRCTA

Because there is an 'R', I want to recognize this case. I am familiar with regex, but people seem to recommend several different libraries. I've seen Boost, TR1, and others. Can someone please suggest either a different way to catch my cases or tell me which library I should use and why?

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

荒芜了季节 2024-11-05 05:33:40

正则表达式对此来说太过分了。您可以使用 std::string::find_first_not_of() 。

回复收藏 0 原文

简单爱 2024-11-05 05:33:40

我想到了使用 C strspn() 。

if (strspn(dnasequence, "ATCG") < strlen(dnasequence)) {
    /* bad character found */
}

Using C strspn() comes to mind.

if (strspn(dnasequence, "ATCG") < strlen(dnasequence)) {
    /* bad character found */
}

回复收藏 0 原文

轻许诺言 2024-11-05 05:33:40

您当然可以使用正则表达式。但为什么不保持简单呢？

bool is_valid_base(char base) {
    switch (std::toupper(base)) {
        case 'A': case 'C': case 'G': case 'T': return true;
        default: return false;
    }
}

bool is_valid_dna(std::string sequence) {
    for (std::string::const_iterator i = sequence.begin(), end = sequence.end();
            i != end; ++i)
        if (not is_valid_base(*i))
            return false;
    return true;
}

You can of course use regular expressions. But why not keep it simple?

bool is_valid_base(char base) {
    switch (std::toupper(base)) {
        case 'A': case 'C': case 'G': case 'T': return true;
        default: return false;
    }
}

bool is_valid_dna(std::string sequence) {
    for (std::string::const_iterator i = sequence.begin(), end = sequence.end();
            i != end; ++i)
        if (not is_valid_base(*i))
            return false;
    return true;
}

回复收藏 0 原文