从 Python Popen 进程获取输出文件？

发布于 2024-10-30 18:34:54 字数 717 浏览 1 评论 0原文

我编写了一个 python 程序来与已编译的程序（称为 ProgramX）交互，该程序有一些难以处理的特性。我需要通过我的 python 程序向 ProgramX 提供数千个输入文件。我想做的是获取 ProgramX 每次运行时创建的输出文件，并将其重命名为合理的名称，例如 inputfilename.output。

问题出在由 ProgramX 编写的输出文件中——它是通过一种不可预测的方法命名的，如果输出文件已经存在，该方法将写入并“无情地覆盖”输出文件（大多数情况下都是这种情况）。节省的恩典可能来自于输出文件有一个标准前缀：想想 ProgramX.notQuiteRandomNumber。

我唯一能想到的就是在我的 bash shell 中做这样的事情：

PROGRAMXOUTPUT=$(ls -ltr ProgramX* | tail -n -1 | awk '{print $8}')
mv $PROGRAMXOUTPUT input.output

它可以完成我需要的 90%，但在我将所有 bash 编程为一系列 Popen 语句之前，是否有更好的方法来做到这一点？这个问题感觉人们可能有比我想象的更好的解决方案。

旁注：我可以毫无问题地获取程序的标准输出，但这是我需要获取的输出文件。

奖励：我计划在同一目录中运行程序的一堆实例，所以我上面的天真的方法可能会开始出现不可预见的问题。因此，也许有一些奇特的东西可以监视 ProgramX 的 PID 并跟踪其输出。

原文

I have written a python program to interface with a compiled program (call it ProgramX) that has some idiosyncrasies that are proving difficult to deal with. I need to feed many thousands of input files to ProgramX via my python program. What I would like to do is to grab the output file that ProgramX creates with each run, and rename it something sensible, like inputfilename.output.

The problem comes in the output file that is written by ProgramX -- it is named via an unpredictable method, which will write, and "mercilessly overwrite", the output file if it already exists (which is the case the majority of the time). The saving grace probably comes with the fact that there is a standard prefix to the output files: think ProgramX.notQuiteRandomNumber.

The only think I can think to do is something like this in my bash shell:

PROGRAMXOUTPUT=$(ls -ltr ProgramX* | tail -n -1 | awk '{print $8}')
mv $PROGRAMXOUTPUT input.output

Which does 90% of what I need, but before I program all that bash into a series of Popen statements, is there a better way to do this? This problem feels like something people might have a much better solution than what I'm thinking.

Sidenote: I can grab the program's standard output without problems, however it's the output file that I need to grab.

Bonus: I was planning on running a bunch of instantiations of the program in the same directory, so my naive approach above may start to have unforeseen problems. So perhaps something fancy that watches the PID of ProgramX and follows its output.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

演多会厌 2024-11-06 18:34:54

要执行上面的 shell 脚本的操作，假设当前目录中只有一个 ProgramX*：

import glob, os

programxoutput = glob.glob('ProgramX*')[0]
os.rename(programxoutput, 'input.output')

如果您需要按时间等排序，也有一些方法可以做到这一点（请参阅在 os.stat 中），但如果您要同时运行 ProgramX 的多个副本，则使用最近的修改日期会导致严重的竞争条件。

我建议您为 ProgramX 的每次运行创建并更改为一个新的（可能是临时的）目录，这样运行就不可能相互干扰。 tempfile 模块可以帮助解决这个问题。

To do what your shell script above does, assuming you've only got one ProgramX* in the current directory:

import glob, os

programxoutput = glob.glob('ProgramX*')[0]
os.rename(programxoutput, 'input.output')

If you need to sort by time, etc., there are ways to do that too (look at os.stat), but using the most recent modification date is a recipe for nasty race conditions if you'll be running multiple copies of ProgramX concurrently.

I'd suggest instead that you create and change to a new, perhaps temporary directory for each run of ProgramX, so the runs have no possibility of treading on each other. The tempfile module can help with this.

回复收藏 0 原文