通过生成中间阶段执行大型 C 程序
我有一个需要 7 天才能运行完成的算法(还有更多算法)
问题:为了成功运行该程序,我需要持续供电。如果运气不好,中间断电了,我需要重新启动。
所以我想问一种方法,可以让我的程序分阶段执行(假设每个阶段生成结果 A、B、C...),现在如果断电,我可以如何使用这个中间结果并从该点继续/恢复运行。
问题 2:如何防止每次循环迭代时重新打开文件( fopen 被放置在运行近一百万次的循环中,这是需要的,因为每次迭代都会更改文件)
I have an algorithm that takes 7 days to Run To Completion (and few more algorithms too)
Problem: In order to successfully Run the program, I need continuous power supply. And if out of luck, there is a power loss in the middle, I need to restart it again.
So I would like to ask a way using which I can make my program execute in phases (say each phase generates Results A,B,C,...) and now in case of a power loss I can some how use this intermediate results and continue/Resume the Run from that point.
Problem 2: How will i prevent a file from re opening every time a loop iterates ( fopen was placed in a loop that runs nearly a million times , this was needed as the file is being changed with each iteration)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
你可以将它分离到一些源文件中,然后使用make。
You can separate it in some source files, and use make.
当每个结果阶段完成时,分支到一个新的宇宙。如果新宇宙的力量失效了,摧毁它并回到你分支的时间点。重复直到所有阶段完成,然后通过先验虫洞将结果合并到原始宇宙中。
When each result phase is complete, branch off to a new universe. If the power fails in the new universe, destroy it and travel back in time to the point at which you branched. Repeat until all phases are finished, and then merge your results into the original universe via a transcendental wormhole.
好吧,我猜有几个选项:
请注意,这两个选项可能会进一步延长您的 7 小时运行时间!
因此,为了提高整体运行时间,您是否也可以分离您的算法,以便它具有可以工作的“工作”组件。关于“工作”这通常意味着绘制一些可以参数化的“愚蠢”但密集的逻辑(例如计算),然后,您可以选择在网格/空间/云/任何东西上运行您的算法。甚至不需要是一个空格...只需使用队列(IBM MQ 系列有一个 C 接口),并让其他机器上的侦听器侦听您的作业队列并在保存结果之前处理您的结果。您仍然可以按照上面讨论的方式对算法进行阶段化。
Well, couple of options, I guess:
Note that both these options may draw out your 7hr run time further!
So, to improve the overall runtime, could you also separate your algorithm so that it has "worker" components that can work on "jobs" in parallel. This usually means drawing out some "dumb" but intensive logic (such as a computation) that can be parameterised. Then, you have the option of running your algorithm on a grid/ space/ cloud/ whatever. At least you have options to reduce the run time. Doesn't even need to be a space... just use queues (IBM MQ Series has a C interface) and just have listeners on other boxes listening to your jobs queue and processing your results before persisting the results. You can still phase the algorithm as discussed above too.
问题 2:在循环的每次迭代中打开文件,因为它已更改
我可能不是最有资格回答这个问题,但在每次迭代时执行
fopen
(以及fclose
)大概看起来既浪费又缓慢。为了回答,或者有更合格的答案,我认为我们需要更多地了解您的数据。例如:
我问,根据你的评论“因为它每次迭代都会改变”,你最好使用随机访问文件。由此,我猜测您正在重新打开 fseek 到您可能已经通过的点(在数据流中)并进行更改。但是,如果您以二进制方式打开文件,则可以使用
fsetpos
和fseek
在文件中的任何位置进行fseek
。也就是说,你可以向后“寻找”。此外,如果您的数据是基于记录的或以某种方式组织的,您还可以为其创建索引。有了这个,您可以使用
fsetpos
将指针设置在您感兴趣的索引处并进行遍历。因此,节省了查找要更改的数据区域的时间。您甚至可以将索引保留在随附的索引文件中。请注意,您可以将纯文本写入二进制文件。也许值得调查?
Problem 2: Opening the file on each iteration of the loop because it's changed
I may not be best qualified to answer this but doing
fopen
on each iteration (andfclose
) presumably seems wasteful and slow. To answer, or have anyone more qualified answer, I think we'd need to know more about your data.For instance:
I ask as, judging by your comment "because it's changed each iteration", would you be better using a random-accessed file. By this, I'm guessing you're re-opening to
fseek
to a point that you may have passed (in your stream of data) and making a change. However, if you open a file as binary, you canfseek
through anywhere in the file usingfsetpos
andfseek
. That is, you can "seek" backwards.Additionally, if your data is record-based or somehow organised, you could also create an index for it. with this, you could use to
fsetpos
to set the pointer at the index you're interested in and traverse. Thus, saving time in finding the area of data to change. You could even persist your index in an accompanying index file.Note that you can write plain text to a binary file. Perhaps worth investigating?
对我来说听起来像是经典的批处理问题。
您需要在应用程序中定义检查点并存储中间数据,直到到达检查点。
检查点可以是数据库中的行号,也可以是文件中的位置。
您的处理时间可能比现在更长,但会更可靠。
一般来说,您应该考虑算法中的瓶颈。
对于问题 2,您必须使用两个文件,如果您调用 fopen 的次数减少 100 万次,您的应用程序可能会快几天......
Sounds like classical batch processing problem for me.
You will need to define checkpoints in your application and store the intermediate data until a checkpoint is reached.
Checkpoints could be the row number in a database, or the position inside a file.
Your processing might take longer than now, but it will be more reliable.
In general you should think about the bottleneck in your algo.
For problem 2, you must use two files, it might be that your application will be days faster, if you call fopen 1 million times less...