开发一次性代码的好策略?

发布于 2024-08-03 07:29:28 字数 446 浏览 5 评论 0原文

我经常编写一次性代码(在研究环境中) -例如,探索科学属性或过程的算法或模型。其中许多“实验”都是一次性的,但有时我发现我需要稍后使用一些。例如,我刚刚发现了我 7 年前编写的字符串匹配代码(由于其他优先事项而停止),但现在对同事的项目很有价值。看完它(我真的写了这么难以理解的代码吗?)我意识到当我重新启动“项目”时,我可以做一些事情来帮助我(“实验”仍然是一个更好的词)。早期的实验“有效”,但我知道当时我没有时间重构,因为我的优先事项在其他地方。

哪些方法可以经济高效地挖掘和重复利用此类工作?

编辑:我已经回答了我自己的问题(如下),因为存在超出实际来源本身的问题。

I frequently write throwaway code (in a research environment) - for example to explore an algorithm or a model for a scientific property or process. Many of these "experiments" are one-off but sometimes I find that I need to use a few later. For example I have just unearthed code for string matching I wrote 7 years ago (stopped because of other priorities) but which is now valuable for a coworker's project. Having looked at it (did I really write such impenetrable code?) I realise there are some things I could have done then to help me when I restarted the "project" ("experiment" is still a better word). The earlier experiment "worked" but I know that at the time I would not have had time to refactor as my priorities lay elsewhere.

What approaches are cost-effective in enabling such work to be dug up and re-used?

EDIT: I have answered my own question (below) because there are issues beyond the actual source itself.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

起风了 2024-08-10 07:29:28

我不同意所有“写评论”的答案。这是因为代码本身无法理解而提供的。

为自己获取一份代码完整(Steve McConnell ,第二版)。如果您首先学习编写可维护代码的技术,那么您不会花费更多时间,并且稍后您将能够轻松地返回工作。

您更喜欢哪一个:

  • 带有注释的隐秘代码?
  • 大部分OK代码没有?

我强烈喜欢后者,因为在隐藏代码未注释的情况下,好的代码更容易理解,而注释是原始开发人员可能犯错误的另一个地方。代码可能有错误,但绝不会错误

一旦您对《Code Complete》感到满意,我建议您< em>实用程序员,因为它提供了稍微更高级别的软件开发建议。

I disagree with all of the answers saying "write comments". That's being offered as a catch-all for the code itself not being understandable.

Get yourself a copy of Code Complete (Steve McConnell, 2nd edition). If you learn the techniques of writing maintainable code in the first place, it won't take you more time, and you will be able to return to your work later with less trouble.

Which would you prefer:

  • Cryptic code with comments?
  • Mostly OK code without?

I strongly prefer the latter, as the OK code is easier to understand in the situations where the cryptic code was uncommented, and comments are another place that the original developer can make mistakes. The code may be buggy, but it's never wrong.

Once you're comfortable with Code Complete, I'd recommend The Pragmatic Programmer, as it gives slightly higher-level software-development advice.

半衬遮猫 2024-08-10 07:29:28

[回答自己的问题]
这个问题还有其他几个方面尚未提出,但我在重新审视它时会发现这些方面很有用。其中一些可能是“不言而喻的”,但请记住此代码是 SVN 和 IDE 之前的代码。

  • 可发现性。实际上很难找到代码。我相信它在我的 SourceForge 项目中,但 7 年来有太多版本和分支,我找不到它。因此,我必须有一个搜索代码的系统,但在 IDE 出现之前,我认为还没有这样的系统。
  • 它有什么作用?。当前的 checkout 包含大约 13 个类(全部位于一个包中,因为当时重构并不容易)。有些是透明的 (DynamicAligner),但另一些是不透明的 (MainBox,因其扩展了 Swing Box 而得名)。有四个 main() 程序,并且发行版中实际上有大约 3 个子项目。因此,拥有一个关于组件实际内容的外部清单至关重要。
  • 有关如何运行它的说明。运行程序时,main() 将提供简短的命令行用法(例如DynamicAligner file1 file2),但它不会说明文件的内容实际上是什么样子。我当时当然知道这一点,但现在不知道了。因此,同级目录中应该有关联的 example 文件。这些比尝试记录文件格式更有价值。
  • 它仍然有效吗?。应该可以不假思索地运行每个示例。第一个问题是相关的库、运行时等是否仍然相关且可用。一位前同事编写了一个只能在特定版本的 Python 上运行的系统。唯一的答案就是重写。因此,我们当然应该尽可能避免任何锁定,我已经训练自己(尽管不一定是同事)来做到这一点。

那么我和同事如何避免将来出现问题呢?我认为第一步是在创建代码时应该有一个创建“项目”(无论多么小)的规则,并且这些项目应该处于版本控制之下。这对你们中的一些人来说可能听起来很明显,但在某些环境(学术界、国内)中,建立项目管理系统会产生巨大的开销。我怀疑大多数学术代码不受任何版本控制。

接下来的问题是如何组织项目。默认情况下,它们不能出现在 Sourceforge 上,因为代码 (a) 很简单,并且 (b) 默认情况下不打开。我们需要一台既可以有公共项目也可以有私人项目的服务器。我会计算出,设置和运行它的工作量约为 0.1 FTE - 即每年 20 天来自各方(安装、培训、维护) - 是否有更简单的选择,我想知道,因为这是一个很大的工作在某些情况下会产生费用 - 我是花时间设置服务器还是写论文?

该项目应努力鼓励良好的纪律。这确实是我希望从这个问题中得到的答案。它可以包括:

  1. 所需组件的模板(清单、自述文件、提交日志、示例、所需库等。并非所有项目都可以在 maven 下运行 - 例如 FORTRAN)。
  2. 一种在大量(至少数百个)小项目中搜索助记符字符串的方法(我喜欢将代码转储到 Googledocs 中的想法,这可能是一条富有成效的途径 - 但需要额外的维护工作)。
  3. 清晰的命名约定。这些比评论更有价值。我现在经常使用 iterateOverAllXAndDoY 类型的名称。当例程实际创建信息时,我尝试使用 createX() 而不是 getX()。我有一个坏习惯,就是调用例程 process() 而不是 ConvertAllBToY()。

我知道但没有使用过 GIT、Mercurial 和 GoogleCode。我不知道这些是花费了多少努力才建立起来的,它们解答了我多少关心的问题。如果有一个 IDE 插件可以帮助创建更好的代码(例如“方法名称选择不当”),我会很高兴。

无论采用何种方法,对于那些天生没有良好代码纪律的人来说,它们都必须自然而然地出现,并且值得付出努力。

[Answering own question]
There are several other aspects to the problem which haven't been raised and which I would have found useful when revisiting it. Some of these may be "self-evident" but remember this code was pre-SVN and IDEs.

  • Discoverability. It has been difficult actually to find the code. I believe it's in my SourceForge project but there are so many versions and branches over 7 years that I can't find it. So I would have to have a system that searched code and until IDEs appeared I don't think there was any.
  • What does it do?. The current checkout contains about 13 classes (all in one package as it wasn't easy to refactor at the time). Some are clear (DynamicAligner) but others are opaque (MainBox, named because it extended a Swing Box). There are four main() programs and there are actually about 3 subprojects in the distrib. So it is critical to have an external manifest as to what the components actually were.
  • instructions on how to run it. When running the program, main() will offer a brief commandline usage (e.g. DynamicAligner file1 file2) but it doesn't say what the contents of files actually look like. I knew this at the time, of course but not now. So there should be associated example files in sibling directories. These are more valuable than trying to document file formats.
  • does it still work?. It should be possible to run each each example without thinking. The first question will be whether the associated libraries, runtimes, etc. are still relevant and available. One ex-coworker wrote a system which only runs with a particular version of Python. The only answer is to rewrite. So certainly we should avoid any lock-in where possible, and I have trained myself (though not necessarily coworkers) to do this.

So how can I and coworkers avoid problems in the future? I think the first step is that there should be a discipline of creating a "project" (however small) when you create code and that these projects should be under version control. This may sound obvious to some of you, but in some environments (academia, domestic) there is a significant overhead to setting up a project management system. I suspect that the majority of academic code is not under any version control.

Then there is the question as to how the projects should be organized. They can't be on Sourceforge by default as the code is (a) trivial and (b) not open by default. We need a server where there can be both communal projects and private ones. I would calculate that the effort to set this up and run it is about 0.1 FTE - that's 20 days a year from all parties (installation, training, maintenance) - if there are easier options I'd like to know as this is a large expense in some cases - do I spend my time setting up a server or do I write papers?

The project should try to encourage good discipline. This is really what I was hoping to get from this question. It could include:

  1. A template of required components (manifest, README, log of commits, examples, required libraries, etc. Not all projects can run under maven - e.g. FORTRAN).
  2. A means of searching a large number (hundreds at least) of small projects for mnemonic strings (I liked the idea of dumping the code in Googledocs, and this may be a fruitful avenue - but it's extra maintenance effort).
  3. Clear naming conventions. These are more valuable than comments. I now regularly have names of the type iterateOverAllXAndDoY. I try to use createX() rather than getX() when the routine actually creates information. I have a bad habit of calling routines process() rather than convertAllBToY().

I am aware of but haven't used GIT and Mercurial and GoogleCode. I do not know how much effort these are to set up and how many of my concerns they answer. I would be delighted if there was an IDE plugin which helped create better code (e.g. "poor choice of method name").

And whatever the approaches they have got to come naturally to people who do not naturraly have good code discipline and to be worth the effort.

晨与橙与城 2024-08-10 07:29:28

正如您的其他帖子中的出色答案所示,并且来自根据我自己的经验,用于研究的软件和已设计的软件之间存在难以跨越的鸿沟。在我看来,《Code Complete》可能会有所帮助,但帮助不大。作为一个经济问题,与为某些东西找到以后的用途而偶尔得到的奖励相比,重构所有东西以供重用是否值得?您的平衡点可能会有所不同。

这是存储片段的实用技巧。不要添加完整的注释,而是添加一些关键字:

  • “图同构包装器”
  • “聚合物模拟退火”
  • “字符串匹配费曼”
  • “平衡”

,然后将代码放在可通过 Google 搜索的位置,例如 GMail 帐户。

编辑:我可能会补充一点,免费的 Google 协作平台实际上是可搜索的 wiki,是放置代码的好地方,无论是以附件的形式还是粘贴的形式。

另外,我应该说我是代码完成并已向研究生提供了多年来编写科学研究软件的副本。这是一个好的开始,但没有灵丹妙药。我现在正在写一篇关于使用开源框架解决科学数据管理问题的论文,其中一个结论是,一些软件工程专业知识对于长期运行的系统至关重要。许多科学项目可能应该从一开始就为此做好预算。

As the excellent answers in your other post indicate, and from my own experience, there is a difficult-to-cross gap between the software used for research and software that has been engineered. In my opinion, Code Complete might help a little, but not much. As an economic question, is it going to be worthwhile to refactor everything for reuse compared to the occasional reward for finding a later use for something? Your balance point may vary.

Here's a practical tip for storing snippets. Instead of full-blown comments, throw in some keywords:

  • "graph isomorphism wrapper"
  • "polymer simulated annealing"
  • "string match feynmann"
  • "equilibrium"

and then put the code somewhere Google-searchable, like a GMail account.

Edit: I might add that free Google Sites are really searchable wikis that are a good place to put code, either in the form of attachments or pasted in.

Also, I should say that I am a fan of Code Complete and have given copies to grad students writing software for scientific research for several years. It's a good start, but no silver bullet. I'm writing a paper right now on using open source frameworks to solve scientific data management problems and one of the conclusions is that some software engineering expertise is essential for long-running systems. Many scientific projects should probably budget for this from the beginning.

旧人哭 2024-08-10 07:29:28

我会回应其他人所说的,就评论代码编写的“原因”及其预期用途而言,但我还会添加以下内容:

代码就好像您计划将其投入生产一样,即使您是只是胡闹。代码的目的:

  • 清晰度和可读性
  • 遵循当时的编码约定。 (命名约定等)。尽管这些约定会随着时间的推移而发生变化,但如果您坚持这些标准,您以后更有可能理解它。
  • 安全(如果适用)
  • 性能(如果适用)

我特别强调第一点,但其他也很重要。我发现如果我以后使用“测试代码”,我倾向于只在它有效的情况下使用它,而不是重构它。

I would echo what the others have said as far as commenting the "why's" of why the code was written and it's intended usage, but I would also add this:

Code as if you were planning on putting this into production even when you're just messing around. Code for:

  • Clarity and readability
  • Follow the coding conventions of the time. (naming conventions, etc). Even though such conventions change over time, if you stick to the standards you're more likely to be able to understand it later.
  • Security (if applicable)
  • performance (if applicable)

Particularly, I would stress the first point, but the others are important as well. I find that if I use "test code" later on, I tend to just use it if it works, rather than refactoring it.

ぺ禁宫浮华殁 2024-08-10 07:29:28

我认为最重要的事情(如果你不进行重构就不会发生)是评论并记录你当时的思维过程。它将有助于使代码变得不那么难以理解,并帮助您在需要时找到好的部分。

I think the most import thing (if you do no refactoring it isn't going to happen) is to comment and document your thought process at the time. It will help make the code less impenetrable and help you find the good bits when needed.

对你再特殊 2024-08-10 07:29:28

不,不,不,不,不!

即使在研究环境中也不要编写一次性代码。请!

目前我正在搞乱这样一个“一次性代码”,即BLAST项目。问题是,它最初只是一个游乐场,但后来碰巧变得有些成功,现在它是一个简洁的工具,实现了许多概念,但代码实际上无法维护。但这不是重点。

要点是,您为工程师进行研究,以便以后从您的发现中受益。在完成了关于一般概念的良好科学工作并编写了一个证明其成功的工具后,您很容易忘记您这样做不仅仅是为了出版和获得博士学位。你这样做是为了人类的利益。您的代码可能包含一堆难以调试的“特殊情况”,以及一组不适合任何会议文章的怪癖和黑客行为。在整个代码中记录和注释此类内容尤其重要。

如果开发人员决定在商业产品中实现您的概念,他可以研究您代码中的怪癖和黑客行为,并且实现中的错误将比可能少。大家都说“哇,他对A的研究真有用!”但如果你写“扔掉”,他们会说“他的概念在纸面上看起来不错,但 X 试图实现它并淹没在一堆错误中”。

编辑:摘自下面的评论)为了帮助代码库的未来开发人员,您不需要太多。首先,评论每个函数的作用。其次,确保对棘手错误的每个非明显修复都放在修订控制系统中的单独提交中(当然,带有适当的注释)。这已经足够了。如果你甚至将东西模块化(即使它们还没有准备好完全重用——根据布鲁克斯的说法,成本要高出三倍),你就会受到实施你的研究的工程师的喜爱。

我认为,如果研究人员抛弃他们的傲慢,不再傲慢地认为他们不是那些写出好代码的卑鄙工作的肮脏程序员,世界将会变得更美好。编写好的代码不仅仅是这些愚蠢的程序员的工作。这是每个人都应该努力的非常有价值的事情。没有这个,你的实验场地、你的代码、你的创意就会消亡。

No, No, No, No, No!

Do not write throwaway code even in a research environment. Please!

Currently I'm messing with such a "throwaway code", namely BLAST project. The thing is that it started as a playground but then happened to become somewhat successful, Now it's a neat tool with many concepts implemented, but the code is virtually unmaintainable. But that's not the main point.

The main point is, you do research for engineers to later benefit from your findings. Having done a good scientific work on general concept and writing a tool that proves this successful, you can easily forget that you're doing it not for publication and PhD only. You do it for the benefit of the mankind. Your code may contain a bunch of "special cases", that were hard to debug, a set of quirks and hacks that do not fit into any conference article. It's especially important to document and comment such things throughout your code.

If a developer decided to implement your concepts in a commercial product, he could have studied the quirks and hacks from your code and the implementation would ten have less bugs than it might have had. Everyone says "Wow, his research on A really is useful!" But if you write "throwaway", they say "his concept looks nice on paper, but X tried to implement it and drowned in a bunch of bugs".

(EDIT: taken from comments below) To help future developers of your codebase, you don't need much. First, comment what each function does. Second, make sure that every non-obvious fix of a tricky bug is placed in a separate commit in revision-control system (with an appropriate comment, of course). That's quite enough. And if you even make things modular (even if they're not ready for outright reuse--that's three times more costly, according to Brooks) you will be adored by engineers who implement your research.

I think that the world would be a better place if researchers threw away their hubris and stopped haughty thinking that they're not these dirty coders who do menial job of writing a good code. Writing a good code is not just a job for these stupid programmers. It is a really valuable thing everyone should strive. Without this, your experimental ground, your code, your brainchild will just die.

伊面 2024-08-10 07:29:28

评论 - 描述您的想法以及为什么选择以某种方式实施某些事情,包括您考虑的替代方案。可能有各种各样奇特的解决方案,但在编写代码时正确注释代码似乎效果最好。

Comments - describe what you were thinking and why you chose to implement something a certain way including what alternatives you considered. There are probably all sorts of fancy solutions but just commenting your code properly at the time you are writing it seems to work the best.

萧瑟寒风 2024-08-10 07:29:28

我可能错过了整个讨论的重点,我经常这样做,但这里是一个邀请砖块和否决的邀请......

如果它是一次性代码,就把它扔掉!

如果您不想扔掉它,请遵循上面的好建议。对我来说,我编写了大量的一次性代码,这些代码是否被丢弃或置于可重用状态并保留以备不时之需的问题归结为经济问题。

我能否预见到该代码将再次有用的情况?千载难逢,一年两次,每月一次?

我是否能够在比使其可重用所需的时间更短的时间内重写此代码?如果这个问题的答案是否定的,那么我需要重复使用它多少次才能在现在增强它的同时使其变得有价值? (回到上一个问题。)

如果我确实使该代码可重用,那么当我下次需要它时是否能够再次找到它? (任何人都曾有过这样的经历:绝对确定地知道代码存储库中的某个地方只有您想要的片段,但不知道它叫什么,也不知道在哪里查找或 grep 查找什么内容?)

最后,使快速编写的代码可重用的三步方法。在执行以下任意步骤后停止:

1) 将代码记录为黑盒。输入、输出、操作。仔细归档此文件。

2) 编写有关如何构建/解释/安装代码的说明,以防您需要移植它。仔细归档这些说明。

3)只有值得付出努力——提高源代码质量以使代码将来可维护。确保源位于源控制系统中并且可以找到。

问候

马克

I've probably missed the point of this whole discussion, I frequently do, but here goes, an invitation for brickbats and downvoting ...

If it's throwaway code, throw it away !

If you don't want to throw it away then follow the good advice above. For me, and I write a fair amount of throwaway code, the question of whether it gets thrown away or put into a reusable state and kept against a rainy day boils down to the economics.

Can I foresee circumstances in which this code will be useful again ? Once in a blue moon, twice a year, every month ?

Will I be able to rewrite this code in less time than it takes to make it reusable ? If the answer to this question is No, then how many times will I have to reuse it to make it worth while enhancing it now ? (Back to the previous question.)

If I do make this code reusable, will I be able to find it again when I next want it ? (Anyone ever had the experience of knowing, with absolute certainty, that somewhere in your code repository there is just the fragment you want, but not having a clue what it was called, nor where to look nor what to grep for ?)

Finally, the 3 step approach to making quickly-written code reusable. Stop after whichever of these steps you like:

1) Document the code as a black-box. Inputs, outputs, operation(s). File this document carefully.

2) Write instructions about how to build/interpret/install the code, in case you ever have to port it. File these instructions carefully.

3) Only if worth the effort -- improve the source code quality to make the code maintainable in future. Make sure the sources are in the source control system and findable.

Regards

Mark

瞳孔里扚悲伤 2024-08-10 07:29:28

一些策略:

  1. 好的评论。以后找不到或无法理解的内容很难重复使用。
  2. 将每个查询保存到已备份或受源代码管理的文件夹中。
  3. 拥有一个有用功能的公共库,一旦某个东西被重用,你就可以将其“升级”到该库。

Some strategies:

  1. Good comments. Hard to reuse what you can't find or understand later.
  2. Save every query to a folder that is backed up or is under source control.
  3. Have a common library of useful functions that you "promote" something to once it has been reused.
七颜 2024-08-10 07:29:28

您还可以从 TDD(测试驱动开发)人员那里借用单元测试的想法。无论如何,您需要确保一次性代码实际上可以正常工作,那么为什么不将检查链接表达为一个小型单元测试呢?这将有两个优点:

  1. 阅读测试代码可以非常清楚地传达一次性的意图:毕竟它用同一种语言:代码表达了其期望。

  2. 这也有助于解决您自我回复的第四个问题:“它仍然有效吗?”。嗯,这很简单:只需运行单元测试,它们就会告诉您什么以及在哪里(如果幸运的话)为什么(它)不起作用。

You could also borrow the idea of unit tests from the TDD (test-driven development) folks. You need to make sure that the throwaway code actually works OK anyway, so why not express the check linke a small unit test? This would have two advantages:

  1. Reading the test code communicates the intent of the throwaway quite clearly: after all it expresses its expectations in the same language: code.

  2. It would also help with the 4th problem of your self-reply: "does it still work?". Well, it's easy: just run the unit tests and they tell you what and where (and with a bit of luck) why (it) doesn't work.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文