有没有类似于 PyCogent 的库,但是是 Java(或 Scala)语言的?
我正在写一个生物进化模拟器。目前,我的所有代码都是用 Python 编写的。在大多数情况下,这很棒,一切都运转良好。然而,这个过程中有两个步骤需要很长时间,我想用 Scala 重写。
第一个问题领域是序列进化。想象一下,您有一个与大量蛋白质相关的系统发育树。每个分支的长度代表父代和子代之间的进化距离。树的根部播种了单个序列,然后是进化模型(例如 http://en. wikipedia.org/wiki/Models_of_DNA_evolution)用于沿着树结构演化序列;考虑到分支长度。 PyCogent 需要很长时间来执行此步骤,我相信合理的 Java/Scala 实现会明显更快。您知道有实现此类功能的库吗?我想用 Scala 编写应用程序,因此,由于互操作性,任何 Java 库就足够了。
第二个问题领域是生成序列的比较。问题是,给定许多不同现存物种中蛋白质的一组序列,尝试使用该序列来重建与物种相关的系统发育树。这个问题本质上对计算要求很高,因为基本上必须在现有物种的所有序列之间进行成对比较。然而,我再次觉得 Java/Scala 实现的执行速度会比 Python 实现快得多,如果没有别的原因,只是 Python 中循环速度很慢。这部分我可以比序列进化部分更容易地从头开始编写,但如果存在一个好的库,我也愿意使用它。
谢谢, 抢
I'm writing a biological evolution simulator. Currently, all of my code is written in Python. For the most part, this is great and everything works sufficiently well. However, there are two steps in the process which take a long time and which I'd like to rewrite in Scala.
The first problem area is sequence evolution. Imagine you're given a phylogenetic tree which relates a large set of proteins. The length of each branch represents the evolutionary distance between the parent and child. The root of the tree is seeded with a single sequence, and then an evolutionary model (e.g. http://en.wikipedia.org/wiki/Models_of_DNA_evolution) is used to evolve the sequence along the tree structure; taking into account the branch lengths. PyCogent takes a long time to perform this step, and I believe that a reasonable Java/Scala implementation would be significantly faster. Do you know of any libraries that implement this type of functionality. I want to write the application in Scala, so, due to interoperability, any Java library will suffice.
The second problem area is the comparison of the generated sequences. The problem is, given a set of sequences for the proteins in a number of different extant species, attempt to use the sequence to reconstruct the phylogenetic tree which relates the species. This problem is inherently computationally demanding, because one must basically do a pairwise comparison between all sequences in the extant species. Here again, however, I feel like a Java/Scala implementation would perform significantly faster than a Python one, if for nothing else than the unfortunately slow speed of looping in Python. This part I could write from scratch more easily than the sequence evolution part, but I'd be willing to use a library for it as well if a good one exists.
Thanks,
Rob
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对于第二个问题,为什么不使用现有的程序来比较序列和推断系统发育树,例如 RAxML 或 MrBayes 并称之为?最大似然和贝叶斯推理是解决这些问题的非常复杂的模型,使用它们似乎比自己实现要好得多 - 类似于最大简约法或邻接树,可能可以为这样的项目从头开始编写,不足以进行进化分析。除非你只是想要一个非常快速和肮脏的拓扑(并且通过 MP 或 NJ 推断的树通常非常错误),否则你可以使用类似 这个
For the second problem, why not make use an existing program for comparing sequences and infering phylogenetic trees, like RAxML or MrBayes and call that? Maximum likelihood and Bayesian inference are very sophisticated models for these problems, and using them seems a far better idea than implementing it yourself - something like a maximum parsiomony or a neihbour-joining tree, which probably could be written from scratch for such a project, is not sufficient for evolutionary analysis. Unless you just want a very quick and dirty topology (and trees inferred via MP or NJ are really often quite false), where you can probably use something like this