是否有任何现有的解决方案可以通过网站前端创建通用 DNA 序列数据库?

发布于 2024-08-13 22:51:14 字数 362 浏览 5 评论 0原文

我想为我工作的实验室创建一个带有网络前端的 rRNA 序列数据库。在生物学中,想要使用 BLAST 和 HMMER 等比对算法搜索大量序列似乎很常见,所以我想知道是否有是否有任何现有的 php/python/rails 项目可以通过网站搜索表单轻松创建通用序列数据库?

更新GMOD 是我正在寻找的服务器类型。我还建议查看 BioMart ,它看起来也有类似的功能。

I'd like to create an rRNA sequence database with a web front end for the lab I work in. It seems common in biology to want to search a large number of sequences using alignment algorithms such as BLAST and HMMER, so I wondered if there is any existing php/python/rails projects that allow easy creation of a generic sequence database with a website search form?

UPDATE: GMOD is the type of server I was looking for. I was also suggested to look at BioMart too which looks to have a similar functionality.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

傲性难收 2024-08-20 22:51:14

不太简单的东西是 http://gmod.org/ - 最简单的安装应该给你一个爆炸形式 & ; “序列浏览器”界面。
不知道是否还有 hmmer 形式...

(扩展得很好 - 从简单的 sqlite 到真正的数据库)

或者,您可能想查看 Galaxy 服务器。 http://main.g2.bx.psu.edu/
它的首要目标是使复杂的基因组查询对于非计算人员来说变得容易,但我不知道它是否有开箱即用的欢呼声


yannick


更新 - 受到这篇文章的部分启发,我们正在开发一个简单的本地爆炸服务器作为一个简单的-to-deploy wwwblast 的替代方案http://www.sequenceserver.com 上的工作正在进行中。演示服务器可让您BLAST 蚂蚁基因组

something a little less barebones is http://gmod.org/ - the simplest installation should give you a blast form & a "sequence browser" interface.
Don't know if theres a hmmer form yet...

(scales pretty well - from a simple sqlite to a real database)

Alternatively, you may want to look into the galaxy server. http://main.g2.bx.psu.edu/
It's first aim is making complex genomic queries easy for non-computational people but I dont know if it has a blast out of the box

cheers,
yannick


UPDATE - Inspired in part by this post, we are developing a simple local blast server as an easy-to-deploy alternative to wwwblast. Work in progress at http://www.sequenceserver.com. A demo server lets you BLAST ant genomes.

恬淡成诗 2024-08-20 22:51:14

这可能有点过分了,但是…… ncbi 有很多可用的软件。 链接

特别是这个

This will be overkill probably but.... ncbi has a lot of software available. Link.

In particular, this.

挥剑断情 2024-08-20 22:51:14

还有一个简单的 CGI 前端与 NCBI BLAST 包一起分发。您可以从他们的 FTP 站点下载它,该站点位于:

ftp://ftp.ncbi.nih.gov/

There's a simple CGI front-end distributed with the NCBI BLAST package as well. You can download it from their FTP site, which is here:

ftp://ftp.ncbi.nih.gov/

好菇凉咱不稀罕他 2024-08-20 22:51:14

它不是您所说的任何一种语言,但有 BioPERL,它是专门为 DNA 和 RNA 以及其他酸和蛋白质碱基“编程”而设计的函数集合。

请在 CPAN.org 中查找它

It's not either of the language you are talking about, but there is BioPERL, which is a collection of functions specifically made for DNA and RNA and other acid and protein base 'programming'

Look for it in CPAN.org

掀纱窥君容 2024-08-20 22:51:14

我强烈建议联系生物信息学界。最重要的是设计数据库并确定其用途。你在标题中提到了 DNA,但在正文中提到了 rRNA——这是完全不同的东西。如果这只是一个拼写错误,那很好 - 但如果您不明白其中的区别,请与社区中的人们交谈。

由于我参与了社区,您可能需要联系 MyExperiment 社区 (http://en.wikipedia .org/wiki/MyExperiment)并在需要时提及我的名字。你会发现很多友好的人并获得帮助。

更新我刚刚注意到您来自曼彻斯特,那是 MyExperiment 的中心,所以它确实是一个明显的起点!

I'd strongly suggest contacting the bioinformatics community. The most important thing is to design the database and decide its purpose. You mention DNA in the title but rRNA in the text - these are completely different things. If it's only a typo, fine - but if you don't understand the difference then talk with people in the community.

Since I'm involved in the community you might like to contact the MyExperiment community (http://en.wikipedia.org/wiki/MyExperiment) and mention my name if you need to. You'll find lots of friendly people and help.

UPDATE I've just noticed you are from Manchester and that's the hub of MyExperiment so it really is the obvious place to start!

德意的啸 2024-08-20 22:51:14

关于 GMOD:我相对确定 GMOD 对于您的应用程序来说完全是多余的。 GMOD 不是服务器,它是工具的集合,数据库模式 (CHADO) 就是其中之一,而 Chado 并不适合那些大多数拥有序列和 id 的人。 BioMart 也不是服务器,它是一个允许模型数据库非规范化的工具,能够足够快地运行全基因组查询。 BioMart 客户端之一 (MartView) 以 Web 界面形式提供。您现在肯定不想使用 Biomart,但我可以通过电子邮件详细解释这一点。
我的印象是,您首先需要一个基于 Web 的 BLAST 客户端才能开始。

Concerning GMOD: I am relatively sure that GMOD is complete overkill for your application. GMOD is not a server, it's a collection of tools, the database schema (CHADO) being one of them, and Chado is not really for someone who mostly will have sequences and ids. BioMart is not a server either, it's a tool that permits de-normalization of model databases, to be able to run whole-genome queries fast enough. One of the BioMart clients (MartView) comes as a web interface. You definitely don't want to use Biomart at the moment but I can explain that in detail by email.
I have the impression that you rather need a web-based BLAST client to get started first.

ㄖ落Θ余辉 2024-08-20 22:51:14

Galaxy:Galaxy 不是一个数据库,它是一个网站,提供用于处理来自不同基因组的(主要是 DNA)序列的工具。 Galaxy 与 UCSC 基因组浏览器序列、工具和文件格式紧密相连。因此,如果您想创建一个全新序列的数据库,Galaxy 不适合您。它也不包含任何 BLAST 服务器。如果您想创建序列数据库,作为 GMOD 一部分的 CHADO 很接近,但我宁愿开始使用文本文件来开始,请参阅上面的帖子。

Galaxy: Galaxy is not a database, it's a website with tools to work with (mostly DNA) sequences from various genomes. Galaxy is tightly linked with the UCSC genome browser sequences, tools and fileformats. So if you want to create a database of entirely new sequences, galaxy is not for you. It doesn't include any BLAST servers either. If you want to create a database of sequences, CHADO as part of GMOD comes close, but I'd rather start use a text file to get started, see my post above.

甲如呢乙后呢 2024-08-20 22:51:14

也许你可以看看Plone4Bio

Plone 是一个用 python 编写的扩展内容管理引擎,具有很多功能和易于使用的应用程序,因此您可以使用论坛、新闻产品等模块的集合来创建您的网站...(我知道你知道这已经是了,但这只是为了提供一些背景)。

Plone4Bio 旨在提供一些生物信息学的克隆应用程序...我不知道这个项目多少钱高级,我还没有使用过它,但似乎至少你有一个序列对象和一些用于可视化它的应用程序,可能还有一些用于搜索它们的应用程序。 (ps 他们在 uniprot 使用它 - 查看任何膜蛋白的“第三方数据”部分)

我不知道有任何其他针对生物信息学的 CMS 应用程序,但也许您也可以轻松地使用 django 实现一些东西,而不需要太多的努力。

Maybe you can look at Plone4Bio.

Plone is an extended content management engine written in python, with a lot of features and easy to use applications, so you can create your website by using a collection of modules like forums, products for news, etc... (I know you know this already but it is just to give a bit of background).

Plone4Bio is aimed at providing some plone applications for bioinformatics... I don't know how much the project is advanced and I haven't used it yet, but it seems that at least you have a sequence object and some apps for visualizing it, and probably some applications to search them. (p.s. they use it at uniprot - look at the 'Third party data' section for any membrane protein)

I don't know of any other CMS apps aimed at bioinformatics, but maybe you can also easily implement something with django without too much effort.

清眉祭 2024-08-20 22:51:14

由于不知道信息将以什么格式存储,或者 DNA 序列如何显示(只是一个长字符串吗?),您可能只需将每个 DNA 序列插入 MySQL 数据库,然后执行简单的查询,例如:

SELECT * FROM `dna_table` WHERE `sequence` = $sequence;

确保使用转义字符串或参数化查询(以防止 SQL 注入),但除此之外,这听起来像是一个非常简单的数据库程序,代码不应超过 100 行左右。

Having no idea about what format the information will be stored in, or how DNA sequences are displayed (is it just a long string?), you may be able to get away with simply inserting each DNA sequence into a MySQL database and then executing a simple query like:

SELECT * FROM `dna_table` WHERE `sequence` = $sequence;

Make sure you use an escape string or a parameterized query (to prevent SQL injection), but other than that, this sounds like a REALLY simple DB program that shouldn't be more than about 100 lines of code.

甩你一脸翔 2024-08-20 22:51:14

我同意:您应该将您的问题发布到 [email protected] 或 bioperl邮件列表。

“使用网站搜索表单轻松创建通用序列数据库”的问题似乎太笼统了。序列数据库是(id,序列)的列表,其本身不需要任何工具支持。至少我不明白为什么你需要一个工具来实现这一点。

我认为你的问题是:是否有一个可以本地安装的 BLAST 客户端作为网络表单?有一些: PLAN 可能值得一试,尽管我从未运行过它。 BioPerl 具有用于独立 BLAST 执行的对象 (http:// doc.bioperl.org/releases/bioperl-1.0/Bio/Tools/Run/StandAloneBlast.html)并可以以图形方式显示结果。 Debian/Ubuntu Med 有 ncbi-tools-bin 和 ncbi-rrna-data,它们可以在几秒钟内安装必要的工具和数据库。

与其考虑工具支持,我宁愿将一个 10 行 CGI 脚本拼凑在一起,该脚本将输入序列执行到您拥有的 Fasta 文件上,然后看看用户是否对此不满意。

关注编程语言:如果您愿意,可以使用 shell 脚本 (*) 来完成此操作。这甚至可能比在 stackoverflow 上发帖花费的时间更少... ;-)

(*) 给偏执的计算机科学同事的注意:这将是生物学家的内部应用程序,他们不知道操作系统和操作员之间的区别重载,所以sql注入是非常非常不可能的...

我认为这是一个过早优化已经足够邪恶的例子,从某种意义上说,你可能会因为设计一个对于简单任务来说过于复杂的系统而浪费大量时间。本着敏捷编程的精神,如果您喜欢软件工程流行语,您可能只需将一些东西组合在一起,然后在考虑架构之前在用户身上进行尝试。

I agree: You should post your question to [email protected] or the bioperl mailing list.

The question "easy creation of a generic sequence database with a website search form" seems too general. A sequence database is a list of (id, sequence) and by itself doesn't need any tool support. At least I don't see any reason why you would need a tool for that.

I think your question is: Is there a BLAST client as webform that one can install locally? There are some: PLAN might worth a try though I never had it running. BioPerl has objects for standalone BLAST execution (http://doc.bioperl.org/releases/bioperl-1.0/Bio/Tools/Run/StandAloneBlast.html) and can display the results graphically. Debian/Ubuntu Med have ncbi-tools-bin and ncbi-rrna-data which install the necessary tools and databases in a couple of seconds.

Instead of pondering tool support I would rather hack together a 10 line CGI script that executes blast with an input sequence onto the Fasta files that you have and then see if the users aren't already happy with that.

Concerned about the programming language: If you like, you can do this with a shell script (*). That might even take you less time than the posting on stackoverflow... ;-)

(*) Note to paranoid computer science collegues: it's going to be an internal application for biologists who don't know the difference between an operating system and operator overloading, so sql injections are very very unlikely...

I think this is an example where premature optimization is evil enough, in the sense that you can loose tons of time with designing a system too complex for a simple task. In the spirit of agile programming, if you like software engineering buzzwords, you might simply hack something together and then try it on your users before thinking about the architecture.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文