I frequently write throwaway code (in a research environment) - for example to explore an algorithm or a model for a scientific property or process. Many of these "experiments" are one-off but sometimes I find that I need to use a few later. For example I have just unearthed code for string matching I wrote 7 years ago (stopped because of other priorities) but which is now valuable for a coworker's project. Having looked at it (did I really write such impenetrable code?) I realise there are some things I could have done then to help me when I restarted the "project" ("experiment" is still a better word). The earlier experiment "worked" but I know that at the time I would not have had time to refactor as my priorities lay elsewhere.

What approaches are cost-effective in enabling such work to be dug up and re-used?

EDIT: I have answered my own question (below) because there are issues beyond the actual source itself.

为自己获取一份代码完整(Steve McConnell ,第二版)。如果您首先学习编写可维护代码的技术,那么您不会花费更多时间,并且稍后您将能够轻松地返回工作。


  • 带有注释的隐秘代码?
  • 大部分OK代码没有?


一旦您对《Code Complete》感到满意,我建议您< em>实用程序员,因为它提供了稍微更高级别的软件开发建议。

I disagree with all of the answers saying "write comments". That's being offered as a catch-all for the code itself not being understandable.

Get yourself a copy of Code Complete (Steve McConnell, 2nd edition). If you learn the techniques of writing maintainable code in the first place, it won't take you more time, and you will be able to return to your work later with less trouble.

Which would you prefer:

  • Cryptic code with comments?
  • Mostly OK code without?

I strongly prefer the latter, as the OK code is easier to understand in the situations where the cryptic code was uncommented, and comments are another place that the original developer can make mistakes. The code may be buggy, but it's never wrong.

Once you're comfortable with Code Complete, I'd recommend The Pragmatic Programmer, as it gives slightly higher-level software-development advice.

这个问题还有其他几个方面尚未提出,但我在重新审视它时会发现这些方面很有用。其中一些可能是“不言而喻的”,但请记住此代码是 SVN 和 IDE 之前的代码。

  • 可发现性。实际上很难找到代码。我相信它在我的 SourceForge 项目中,但 7 年来有太多版本和分支,我找不到它。因此,我必须有一个搜索代码的系统,但在 IDE 出现之前,我认为还没有这样的系统。
  • 它有什么作用?。当前的 checkout 包含大约 13 个类(全部位于一个包中,因为当时重构并不容易)。有些是透明的 (DynamicAligner),但另一些是不透明的 (MainBox,因其扩展了 Swing Box 而得名)。有四个 main() 程序,并且发行版中实际上有大约 3 个子项目。因此,拥有一个关于组件实际内容的外部清单至关重要。
  • 有关如何运行它的说明。运行程序时,main() 将提供简短的命令行用法(例如DynamicAligner file1 file2),但它不会说明文件的内容实际上是什么样子。我当时当然知道这一点,但现在不知道了。因此,同级目录中应该有关联的 example 文件。这些比尝试记录文件格式更有价值。
  • 它仍然有效吗?。应该可以不假思索地运行每个示例。第一个问题是相关的库、运行时等是否仍然相关且可用。一位前同事编写了一个只能在特定版本的 Python 上运行的系统。唯一的答案就是重写。因此,我们当然应该尽可能避免任何锁定,我已经训练自己(尽管不一定是同事)来做到这一点。


接下来的问题是如何组织项目。默认情况下,它们不能出现在 Sourceforge 上,因为代码 (a) 很简单,并且 (b) 默认情况下不打开。我们需要一台既可以有公共项目也可以有私人项目的服务器。我会计算出,设置和运行它的工作量约为 0.1 FTE - 即每年 20 天来自各方(安装、培训、维护) - 是否有更简单的选择,我想知道,因为这是一个很大的工作在某些情况下会产生费用 - 我是花时间设置服务器还是写论文?


  1. 所需组件的模板(清单、自述文件、提交日志、示例、所需库等。并非所有项目都可以在 maven 下运行 - 例如 FORTRAN)。
  2. 一种在大量(至少数百个)小项目中搜索助记符字符串的方法(我喜欢将代码转储到 Googledocs 中的想法,这可能是一条富有成效的途径 - 但需要额外的维护工作)。
  3. 清晰的命名约定。这些比评论更有价值。我现在经常使用 iterateOverAllXAndDoY 类型的名称。当例程实际创建信息时,我尝试使用 createX() 而不是 getX()。我有一个坏习惯,就是调用例程 process() 而不是 ConvertAllBToY()。

我知道但没有使用过 GIT、Mercurial 和 GoogleCode。我不知道这些是花费了多少努力才建立起来的,它们解答了我多少关心的问题。如果有一个 IDE 插件可以帮助创建更好的代码(例如“方法名称选择不当”),我会很高兴。


[Answering own question]
There are several other aspects to the problem which haven't been raised and which I would have found useful when revisiting it. Some of these may be "self-evident" but remember this code was pre-SVN and IDEs.

  • Discoverability. It has been difficult actually to find the code. I believe it's in my SourceForge project but there are so many versions and branches over 7 years that I can't find it. So I would have to have a system that searched code and until IDEs appeared I don't think there was any.
  • What does it do?. The current checkout contains about 13 classes (all in one package as it wasn't easy to refactor at the time). Some are clear (DynamicAligner) but others are opaque (MainBox, named because it extended a Swing Box). There are four main() programs and there are actually about 3 subprojects in the distrib. So it is critical to have an external manifest as to what the components actually were.
  • instructions on how to run it. When running the program, main() will offer a brief commandline usage (e.g. DynamicAligner file1 file2) but it doesn't say what the contents of files actually look like. I knew this at the time, of course but not now. So there should be associated example files in sibling directories. These are more valuable than trying to document file formats.
  • does it still work?. It should be possible to run each each example without thinking. The first question will be whether the associated libraries, runtimes, etc. are still relevant and available. One ex-coworker wrote a system which only runs with a particular version of Python. The only answer is to rewrite. So certainly we should avoid any lock-in where possible, and I have trained myself (though not necessarily coworkers) to do this.

So how can I and coworkers avoid problems in the future? I think the first step is that there should be a discipline of creating a "project" (however small) when you create code and that these projects should be under version control. This may sound obvious to some of you, but in some environments (academia, domestic) there is a significant overhead to setting up a project management system. I suspect that the majority of academic code is not under any version control.

Then there is the question as to how the projects should be organized. They can't be on Sourceforge by default as the code is (a) trivial and (b) not open by default. We need a server where there can be both communal projects and private ones. I would calculate that the effort to set this up and run it is about 0.1 FTE - that's 20 days a year from all parties (installation, training, maintenance) - if there are easier options I'd like to know as this is a large expense in some cases - do I spend my time setting up a server or do I write papers?

The project should try to encourage good discipline. This is really what I was hoping to get from this question. It could include:

  1. A template of required components (manifest, README, log of commits, examples, required libraries, etc. Not all projects can run under maven - e.g. FORTRAN).
  2. A means of searching a large number (hundreds at least) of small projects for mnemonic strings (I liked the idea of dumping the code in Googledocs, and this may be a fruitful avenue - but it's extra maintenance effort).
  3. Clear naming conventions. These are more valuable than comments. I now regularly have names of the type iterateOverAllXAndDoY. I try to use createX() rather than getX() when the routine actually creates information. I have a bad habit of calling routines process() rather than convertAllBToY().

I am aware of but haven't used GIT and Mercurial and GoogleCode. I do not know how much effort these are to set up and how many of my concerns they answer. I would be delighted if there was an IDE plugin which helped create better code (e.g. "poor choice of method name").

And whatever the approaches they have got to come naturally to people who do not naturraly have good code discipline and to be worth the effort.

正如您的其他帖子中的出色答案所示,并且来自根据我自己的经验,用于研究的软件和已设计的软件之间存在难以跨越的鸿沟。在我看来,《Code Complete》可能会有所帮助,但帮助不大。作为一个经济问题,与为某些东西找到以后的用途而偶尔得到的奖励相比,重构所有东西以供重用是否值得?您的平衡点可能会有所不同。


  • “图同构包装器”
  • “聚合物模拟退火”
  • “字符串匹配费曼”
  • “平衡”

,然后将代码放在可通过 Google 搜索的位置,例如 GMail 帐户。

编辑:我可能会补充一点,免费的 Google 协作平台实际上是可搜索的 wiki,是放置代码的好地方,无论是以附件的形式还是粘贴的形式。


As the excellent answers in your other post indicate, and from my own experience, there is a difficult-to-cross gap between the software used for research and software that has been engineered. In my opinion, Code Complete might help a little, but not much. As an economic question, is it going to be worthwhile to refactor everything for reuse compared to the occasional reward for finding a later use for something? Your balance point may vary.

Here's a practical tip for storing snippets. Instead of full-blown comments, throw in some keywords:

  • "graph isomorphism wrapper"
  • "polymer simulated annealing"
  • "string match feynmann"
  • "equilibrium"

and then put the code somewhere Google-searchable, like a GMail account.

Edit: I might add that free Google Sites are really searchable wikis that are a good place to put code, either in the form of attachments or pasted in.

Also, I should say that I am a fan of Code Complete and have given copies to grad students writing software for scientific research for several years. It's a good start, but no silver bullet. I'm writing a paper right now on using open source frameworks to solve scientific data management problems and one of the conclusions is that some software engineering expertise is essential for long-running systems. Many scientific projects should probably budget for this from the beginning.

  • 清晰度和可读性
  • 遵循当时的编码约定。 (命名约定等)。尽管这些约定会随着时间的推移而发生变化,但如果您坚持这些标准,您以后更有可能理解它。
  • 安全(如果适用)
  • 性能(如果适用)


I would echo what the others have said as far as commenting the "why's" of why the code was written and it's intended usage, but I would also add this:

Code as if you were planning on putting this into production even when you're just messing around. Code for:

  • Clarity and readability
  • Follow the coding conventions of the time. (naming conventions, etc). Even though such conventions change over time, if you stick to the standards you're more likely to be able to understand it later.
  • Security (if applicable)
  • performance (if applicable)

Particularly, I would stress the first point, but the others are important as well. I find that if I use "test code" later on, I tend to just use it if it works, rather than refactoring it.

I think the most import thing (if you do no refactoring it isn't going to happen) is to comment and document your thought process at the time. It will help make the code less impenetrable and help you find the good bits when needed.

如果开发人员决定在商业产品中实现您的概念,他可以研究您代码中的怪癖和黑客行为,并且实现中的错误将比可能少。大家都说“哇,他对A的研究真有用!”但如果你写“扔掉”,他们会说“他的概念在纸面上看起来不错,但 X 试图实现它并淹没在一堆错误中”。



No, No, No, No, No!

Do not write throwaway code even in a research environment. Please!

Currently I'm messing with such a "throwaway code", namely BLAST project. The thing is that it started as a playground but then happened to become somewhat successful, Now it's a neat tool with many concepts implemented, but the code is virtually unmaintainable. But that's not the main point.

The main point is, you do research for engineers to later benefit from your findings. Having done a good scientific work on general concept and writing a tool that proves this successful, you can easily forget that you're doing it not for publication and PhD only. You do it for the benefit of the mankind. Your code may contain a bunch of "special cases", that were hard to debug, a set of quirks and hacks that do not fit into any conference article. It's especially important to document and comment such things throughout your code.

If a developer decided to implement your concepts in a commercial product, he could have studied the quirks and hacks from your code and the implementation would ten have less bugs than it might have had. Everyone says "Wow, his research on A really is useful!" But if you write "throwaway", they say "his concept looks nice on paper, but X tried to implement it and drowned in a bunch of bugs".

(EDIT: taken from comments below) To help future developers of your codebase, you don't need much. First, comment what each function does. Second, make sure that every non-obvious fix of a tricky bug is placed in a separate commit in revision-control system (with an appropriate comment, of course). That's quite enough. And if you even make things modular (even if they're not ready for outright reuse--that's three times more costly, according to Brooks) you will be adored by engineers who implement your research.

I think that the world would be a better place if researchers threw away their hubris and stopped haughty thinking that they're not these dirty coders who do menial job of writing a good code. Writing a good code is not just a job for these stupid programmers. It is a really valuable thing everyone should strive. Without this, your experimental ground, your code, your brainchild will just die.

评论 - 描述您的想法以及为什么选择以某种方式实施某些事情,包括您考虑的替代方案。可能有各种各样奇特的解决方案,但在编写代码时正确注释代码似乎效果最好。

Comments - describe what you were thinking and why you chose to implement something a certain way including what alternatives you considered. There are probably all sorts of fancy solutions but just commenting your code properly at the time you are writing it seems to work the best.

我是否能够在比使其可重用所需的时间更短的时间内重写此代码?如果这个问题的答案是否定的,那么我需要重复使用它多少次才能在现在增强它的同时使其变得有价值? (回到上一个问题。)

如果我确实使该代码可重用,那么当我下次需要它时是否能够再次找到它? (任何人都曾有过这样的经历:绝对确定地知道代码存储库中的某个地方只有您想要的片段,但不知道它叫什么,也不知道在哪里查找或 grep 查找什么内容?)


1) 将代码记录为黑盒。输入、输出、操作。仔细归档此文件。

2) 编写有关如何构建/解释/安装代码的说明,以防您需要移植它。仔细归档这些说明。




I've probably missed the point of this whole discussion, I frequently do, but here goes, an invitation for brickbats and downvoting ...

If it's throwaway code, throw it away !

If you don't want to throw it away then follow the good advice above. For me, and I write a fair amount of throwaway code, the question of whether it gets thrown away or put into a reusable state and kept against a rainy day boils down to the economics.

Can I foresee circumstances in which this code will be useful again ? Once in a blue moon, twice a year, every month ?

Will I be able to rewrite this code in less time than it takes to make it reusable ? If the answer to this question is No, then how many times will I have to reuse it to make it worth while enhancing it now ? (Back to the previous question.)

If I do make this code reusable, will I be able to find it again when I next want it ? (Anyone ever had the experience of knowing, with absolute certainty, that somewhere in your code repository there is just the fragment you want, but not having a clue what it was called, nor where to look nor what to grep for ?)

Finally, the 3 step approach to making quickly-written code reusable. Stop after whichever of these steps you like:

1) Document the code as a black-box. Inputs, outputs, operation(s). File this document carefully.

2) Write instructions about how to build/interpret/install the code, in case you ever have to port it. File these instructions carefully.

3) Only if worth the effort -- improve the source code quality to make the code maintainable in future. Make sure the sources are in the source control system and findable.



  1. 好的评论。以后找不到或无法理解的内容很难重复使用。
  2. 将每个查询保存到已备份或受源代码管理的文件夹中。
  3. 拥有一个有用功能的公共库,一旦某个东西被重用,你就可以将其“升级”到该库。

Some strategies:

  1. Good comments. Hard to reuse what you can't find or understand later.
  2. Save every query to a folder that is backed up or is under source control.
  3. Have a common library of useful functions that you "promote" something to once it has been reused.
您还可以从 TDD(测试驱动开发)人员那里借用单元测试的想法。无论如何,您需要确保一次性代码实际上可以正常工作,那么为什么不将检查链接表达为一个小型单元测试呢?这将有两个优点:

  1. 阅读测试代码可以非常清楚地传达一次性的意图:毕竟它用同一种语言:代码表达了其期望。

  2. 这也有助于解决您自我回复的第四个问题:“它仍然有效吗?”。嗯,这很简单:只需运行单元测试,它们就会告诉您什么以及在哪里(如果幸运的话)为什么(它)不起作用。

You could also borrow the idea of unit tests from the TDD (test-driven development) folks. You need to make sure that the throwaway code actually works OK anyway, so why not express the check linke a small unit test? This would have two advantages:

  1. Reading the test code communicates the intent of the throwaway quite clearly: after all it expresses its expectations in the same language: code.

  2. It would also help with the 4th problem of your self-reply: "does it still work?". Well, it's easy: just run the unit tests and they tell you what and where (and with a bit of luck) why (it) doesn't work.

