组织生物信息学项目的最佳方式?

发布于 2024-09-24 00:20:12 字数 1431 浏览 7 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

刘备忘录 2024-10-01 00:20:12

我是一名软件专家,隶属于一个研究科学家团队,尽管是在地球科学领域,而不是生命科学领域。你写的很多东西我都很熟悉。

需要记住的一件事是,您在学习中学到的大部分内容都是关于持续使用的工程软件。正如您所观察到的,科学家所做的很多研究都是一次性使用,而工程方法并不合适。如果你想实现优秀软件工程的某些方面,你就必须仔细选择你的战斗。

在开始进行任何战斗之前,您必须仔细检查自己的想法,以确保您在学校学到的通用软件工程知识适用于您当前的情况。不要假设是这样。

就我而言,我选择的第一场战斗是源代码控制的实施。当你没有适当的版本控制时,不难找到所有出错的例子:

  • 一些用户有几十个目录,每个目录都有不同版本的“相同”代码,并且只有最模糊的想法他们中的大多数人都做了独一无二的事情,或者说他们为什么在那里;
  • 一些用户因覆盖而丢失了有用的修改,并且无法记住他们所做的事情;
  • 很容易发现人们正在从事本应相同的计划,但实际上却在不同的方向上发展不兼容;

一旦我收集了信息——并确保你记下了谁说了什么以及他们付出了什么代价——通过源代码控制描绘一个更美好世界的图景就变得相对容易了

接下来,好吧,接下来你必须选择你自己的下一场战斗。但你必须在你的科学家同事心中播下怀疑的种子之一,那就是“可重复性”。如果科学实验不可重复,那么它就是无效的;如果他们的实验涉及软件(而且他们总是这样做),那么仔细的软件工程对于可重复性至关重要。其中很多内容都与数据来源有关,但这是另一天的主题。

I'm a software specialist embedded in a team of research scientists, though in the earth sciences, not the life sciences. A lot of what you write is familiar to me.

One thing to bear in mind is that much of what you have learned in your studies is about engineering software for continued use. As you have observed a lot of what research scientists do is about one-off use and the engineered approach is not suitable. If you want to implement some aspects of good software engineering you are going to have to pick your battles carefully.

Before you start fighting any battles, you are going to have to critically examine your own ideas to ensure that what you learned in school about general-purpose software engineering is valid for your current situation. Don't assume that it is.

In my case the first battle I picked was the implementation of source code control. It wasn't hard to find examples of all the things that go wrong when you don't have version control in place:

  • some users had dozens of directories each with different versions of the 'same' code, and only the haziest idea of what most of them did that was unique, or why they were there;
  • some users had lost useful modifications by overwriting them and not being able to remember what they had done;
  • it was easy to find situations where people were working on what should have been the same program but were in fact developing incompatibly in different directions;
  • etc etc etc

Once I had gathered the information -- and make sure you keep good notes about who said what and what it cost them -- it became relatively easy to paint a picture of a better world with source code control.

Next, well, next you have to choose your own next battle. But one of the seeds of doubt you have to sow in your scientist-colleagues minds is 'reproducibility'. Scientific experiments are not valid if they are not reproducible; if their experiments involve software (and they always do) then careful software engineering is essential for reproducibility. A lot of this is about data provenance, but that's a topic for another day.

无畏 2024-10-01 00:20:12

这里的部分问题是软件文档与发布文档之间的区别。

对于软件开发(和研究计划)设计,重要的文档是结构性的和有意的。因此,对数据进行建模、你做某事的原因等等。我强烈建议使用你在计算机科学中学到的技能来记录你的研究计划。为您想做的事情制定一个计划可以让您在运行长时间分析时拥有很大的自由度来执行多项任务。

另一方面,很多生物信息学工作都是分析。在这里,您需要将文档视为实验室笔记本,而不一定是项目计划。您想要记录您所做的事情,也许是简短的评论原因(例如,当您对数据进行故障排除时),以及输出和结果是什么。
我所做的事情相当简单。
首先,我从一个目录开始并创建一个 git 存储库。然后,每当我更改某些文件时,我都会将其提交到存储库。我尽可能地尝试以一种可以放入我的 git 忽略文件中的方式命名数据输出。
然后,我尽可能一次为一个项目处理一个终端会话,当我遇到暂停点时(例如当我将一组作业发送到网格时),我运行“history |”。 cut -c 8-' 并将其粘贴到实验室笔记文件中,然后编辑该文件以添加我所做的注释,并记住,将 git add/commit 行更改为 git checkout (我有一个基于此的脚本)只要我在正确的目录中启动它,并且我的外部数据不会消失,这意味着我可以稍后重新创建整个过程,

即使是稍微复杂的处理任务,我也可以编写一个脚本。为了做到这一点,以便我的笔记本尽可能看起来干净,辅助脚本可以被视为较大项目中的子例程,并且应该至少在内部记录到该级别。

Part of the issue here is the distinction between documentation for software vs documentation for publication.

For software development (and research plan) design, the important documentation is structural and intentional. Thus, modeling the data, reasons why you are doing something, etc. I strongly recommend using the skills you've learned in CS for documenting your research plan. Having a plan for what you want to do gives you a lot of freedom to multi-task while long analyses are running.

On the other hand, a lot of bioinformatics work is analysis. Here, you need to treat documentation like a lab notebook, and not necessarily a project plan. You want to be document what you did, maybe a brief comment why (e.g. when you are troubleshooting data), and what the outputs and results are.
What I do is fairly simple.
First, I start in a directory and create a git repo. Then, whenever I change some file, I commit it to the repo. As much as possible, I try to name data outputs in a way that I can drop then into my git ignore files.
Then, as much as possible, I work on a single terminal session for a project at a time, and when I hit a pause point (like when I've got a set of jobs sent up to the grid, I run 'history |cut -c 8-' and paste that into a lab notes file. I then edit the file to add comments for what I did, and remember, change the git add/commit lines to git checkout (I have a script that does this based on the commit messages). As long as I start it in the right directory, and my external data doesn't go away, this means that I can recreate the entire process later.

For any even slightly complex processing tasks, I write a script to do it, so that my notebook, as much as possible, looks clean. To an approximation, a helper script can be viewed as a subroutine in a larger project, and should be documented internally to at least that level.

影子是时光的心 2024-10-01 00:20:12

你的问题是关于项目管理的。糟糕的项目管理并不是生物信息学所独有的。我很难相信整个生物信息学行业都致力于糟糕的软件设计。

关于压力……这个世界上还有其他人有非常具有挑战性的最后期限,但他们仍在使用良好的软件设计。

在许多情况下,遵循良好的软件设计不会抑制项目,甚至可能加快其设计和维护(至少从长远来看)。

现在回答你真正的问题...你可以让你的经理重新设计一小部分代码,这些代码对代码的其余部分没有影响,作为概念验证(POC),但是很难阻止卡车继续行驶搬家,所以如果他觉得“我们这样工作了很多年——我们知道我们在做什么,我们不需要孩子来教我们如何做我们的工作”,请不要生气。学会像其他人一样工作,当你赢得他们的信任时,你就可以
偶尔做你的事情(我希望你有时间和奉献精神做正确的事情)。

祝你好运。

Your question is about project management. Bad project management is not unique to bioinformatics. I find it hard to believe that the entire industry of bioinformatics is commited to bad software design.

About the presure... Again there are others in this world that have very challenging deadlines, and they are still using good software designs.

In many cases, following a good software design does not hold down the projects and may even speed its design and maintainance (at least on the long run).

Now to your real question... You can offer your manager to redesign small parts of the code that have no influence on the rest of the code as a proof of concept (POC), but it's really hard to stop a truck from keep on moving, so don't get upset if he feels "we worked this way for years - we know what we are doing, and we don't need a child to teach us how to do our work". Learn to work like the rest and when you will gain their trust, you could
do your thing once in a while (I hope you will have time and the devotion to do the right thing).

Good luck.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文