长时间运行的数据处理Python脚本中的程序结构

发布于 2024-09-03 08:21:48 字数 809 浏览 9 评论 0原文

对于我当前的工作，我正在编写一些长时间运行（考虑几个小时到几天）的脚本来执行 CPU 密集型数据处理。程序流程非常简单 - 它进入主循环，完成主循环，保存输出并终止：我的程序的基本结构往往是这样的：

<import statements>
<constant declarations>

<misc function declarations>

def main():
   for blah in blahs():
      <lots of local variables>
      <lots of tightly coupled computation>

      for something in somethings():
          <lots more local variables>
          <lots more computation>

   <etc., etc.>

   <save results>

if __name__ == "__main__":
    main()

这很快就会变得难以管理，所以我想将其重构为更多内容易于管理。我想让它更易于维护，而不牺牲执行速度。

然而，每个代码块都依赖于大量变量，因此将部分计算重构为函数将使参数列表很快失控。我应该将这种代码放入Python类中，并将局部变量更改为类变量吗？从概念上讲，将程序转换为类并没有多大意义，因为该类永远不会被重用，并且每个实例只会创建一个实例。

此类计划的最佳实践结构是什么？我正在使用 python，但问题相对与语言无关，假设现代面向对象的语言功能。

原文

For my current job I am writing some long-running (think hours to days) scripts that do CPU intensive data-processing. The program flow is very simple - it proceeds into the main loop, completes the main loop, saves output and terminates: The basic structure of my programs tends to be like so:

<import statements>
<constant declarations>

<misc function declarations>

def main():
   for blah in blahs():
      <lots of local variables>
      <lots of tightly coupled computation>

      for something in somethings():
          <lots more local variables>
          <lots more computation>

   <etc., etc.>

   <save results>

if __name__ == "__main__":
    main()

This gets unmanageable quickly, so I want to refactor it into something more manageable. I want to make this more maintainable, without sacrificing execution speed.

Each chuck of code relies on a large number of variables however, so refactoring parts of the computation out to functions would make parameters list grow out of hand very quickly. Should I put this sort of code into a python class, and change the local variables into class variables? It doesn't make a great deal of sense tp me conceptually to turn the program into a class, as the class would never be reused, and only one instance would ever be created per instance.

What is the best practice structure for this kind of program? I am using python but the question is relatively language-agnostic, assuming a modern object-oriented language features.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

念﹏祤嫣 2024-09-10 08:21:48

首先，如果您的程序要运行几个小时/几天，那么切换到使用类/方法而不是将所有内容都放在一个巨大的主程序中的开销几乎不存在。

此外，从长远来看，重构（即使它确实涉及传递大量变量）应该可以帮助您提高速度。对设计良好的应用程序进行分析要容易得多，因为您可以查明缓慢的部分并在那里进行优化。也许会出现一个针对您的计算进行了高度优化的新库...设计良好的程序将让您立即将其插入并进行测试。或者您可能决定编写 C 模块扩展来提高计算子集的速度，设计良好的应用程序也将使这变得容易。

如果没有看到<大量紧密耦合的计算>和<大量更多的计算>，很难给出具体的建议。但是，我将从让每个 for 块成为它自己的方法开始，然后从那里开始。