从 Python Popen 进程获取输出文件?
我编写了一个 python 程序来与已编译的程序(称为 ProgramX)交互,该程序有一些难以处理的特性。我需要通过我的 python 程序向 ProgramX 提供数千个输入文件。我想做的是获取 ProgramX 每次运行时创建的输出文件,并将其重命名为合理的名称,例如 inputfilename.output。
问题出在由 ProgramX 编写的输出文件中——它是通过一种不可预测的方法命名的,如果输出文件已经存在,该方法将写入并“无情地覆盖”输出文件(大多数情况下都是这种情况) 。节省的恩典可能来自于输出文件有一个标准前缀:想想 ProgramX.notQuiteRandomNumber。
我唯一能想到的就是在我的 bash shell 中做这样的事情:
PROGRAMXOUTPUT=$(ls -ltr ProgramX* | tail -n -1 | awk '{print $8}')
mv $PROGRAMXOUTPUT input.output
它可以完成我需要的 90%,但在我将所有 bash 编程为一系列 Popen 语句之前,是否有更好的方法来做到这一点?这个问题感觉人们可能有比我想象的更好的解决方案。
旁注:我可以毫无问题地获取程序的标准输出,但这是我需要获取的输出文件。
奖励:我计划在同一目录中运行程序的一堆实例,所以我上面的天真的方法可能会开始出现不可预见的问题。因此,也许有一些奇特的东西可以监视 ProgramX 的 PID 并跟踪其输出。
I have written a python program to interface with a compiled program (call it ProgramX) that has some idiosyncrasies that are proving difficult to deal with. I need to feed many thousands of input files to ProgramX via my python program. What I would like to do is to grab the output file that ProgramX creates with each run, and rename it something sensible, like inputfilename.output.
The problem comes in the output file that is written by ProgramX -- it is named via an unpredictable method, which will write, and "mercilessly overwrite", the output file if it already exists (which is the case the majority of the time). The saving grace probably comes with the fact that there is a standard prefix to the output files: think ProgramX.notQuiteRandomNumber.
The only think I can think to do is something like this in my bash shell:
PROGRAMXOUTPUT=$(ls -ltr ProgramX* | tail -n -1 | awk '{print $8}')
mv $PROGRAMXOUTPUT input.output
Which does 90% of what I need, but before I program all that bash into a series of Popen statements, is there a better way to do this? This problem feels like something people might have a much better solution than what I'm thinking.
Sidenote: I can grab the program's standard output without problems, however it's the output file that I need to grab.
Bonus: I was planning on running a bunch of instantiations of the program in the same directory, so my naive approach above may start to have unforeseen problems. So perhaps something fancy that watches the PID of ProgramX and follows its output.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
要执行上面的 shell 脚本的操作,假设当前目录中只有一个
ProgramX*
:如果您需要按时间等排序,也有一些方法可以做到这一点(请参阅在
os.stat
中),但如果您要同时运行 ProgramX 的多个副本,则使用最近的修改日期会导致严重的竞争条件。我建议您为 ProgramX 的每次运行创建并更改为一个新的(可能是临时的)目录,这样运行就不可能相互干扰。 tempfile 模块可以帮助解决这个问题。
To do what your shell script above does, assuming you've only got one
ProgramX*
in the current directory:If you need to sort by time, etc., there are ways to do that too (look at
os.stat
), but using the most recent modification date is a recipe for nasty race conditions if you'll be running multiple copies of ProgramX concurrently.I'd suggest instead that you create and change to a new, perhaps temporary directory for each run of ProgramX, so the runs have no possibility of treading on each other. The tempfile module can help with this.
我看到两个选项:
Two options that I see:
如果只有一个
ProgramX*
文件,那么:If there is only one
ProgramX*
file, then what about just: