如何在r中进行文本字符串(UTF8)的多序列对齐
给定三个字符串:
seq <- c("abcd", "bcde", "cdef", "af", "cdghi")
我想执行多个序列对齐,以便得到以下结果:
abcd
bcde
cdef
a f
cd ghi
使用MSA软件包中使用MSA()函数,我尝试了
msa(seq, type = "protein", order = "input", method = "Muscle")
以下结果:
aln names
[1] ABCD--- Seq1
[2] -BCDE-- Seq2
[3] --CD-EF Seq3
[4] -----AF Seq4
[5] --CDGHI Seq5
Con --CD-?? Consensus
我想将此函数用于可以可以的序列包含任何Unicode字符,但在此示例中已经发出警告:找到无效的字母。有什么想法吗?
Given three strings:
seq <- c("abcd", "bcde", "cdef", "af", "cdghi")
I would like to do multiple sequence alignment so that I get the following result:
abcd
bcde
cdef
a f
cd ghi
Using the msa() function from the msa package I tried:
msa(seq, type = "protein", order = "input", method = "Muscle")
and got the following result:
aln names
[1] ABCD--- Seq1
[2] -BCDE-- Seq2
[3] --CD-EF Seq3
[4] -----AF Seq4
[5] --CDGHI Seq5
Con --CD-?? Consensus
I would like to use this function for sequences that can contain any unicode characters, but already in this example the function gives a warning: invalid letters found. Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是基本R中的一个解决方案,输出一个表:
我们可以不用引号打印输出以更清楚地看到它:
在2022-05-25上创建的 reprex软件包(v2.0.1)
Here's a solution in base R that outputs a table:
We can print the output without quotes to see it more clearly:
Created on 2022-05-25 by the reprex package (v2.0.1)
解决方案是使用lingpy。首先根据说明在:。然后运行:
输出:
A solution is to use LingPy. First install LingPy according to the instructions at: http://lingpy.org/tutorial/installation.html. Then run:
Output: