中间文件是不好的做法吗?
最近,我对这个问题的回答被否决了(这只让我有点烦恼:))。该人没有对否决票提供任何解释,这让我开始思考:“为什么要避免生成中间文件?”尤其是在像 Python 这样的语言中,文件 IO 非常简单。
人们似乎一致认为这是一个坏主意,但我知道中间文件在实践中经常使用。我在一家非常备受尊敬的研究公司工作(如果没有这家公司,SO 就不会存在),假设您的程序会生成文件作为输出。我们这样做是因为,如果您的程序确实值得成为一个独立程序,那么它将需要可调试的输出以及在进程之间传递其输出的某种方式,以便稍后进行检查,以防我们在输出中发现错误再往下游。
使用中间文件是否被认为是不好的做法(在上面链接的问题的情况下)?为什么?
I was recently downvoted (which only bugged me a little :) ) for an answer I gave to this question. The person offered no explanation for the down vote which started me thinking: "Why would you avoid producing intermediate files?" Especially in a language like Python where File IO is laughably easy.
There seemed to be consensus that it was a bad idea, but I know for a fact that intermediate files are used regularly in practice. I worked for a very well respected research firm (let's just say S.O. wouldn't exist without this firm) where it was assumed that your programs would produce files as output. We did this because if your program indeed deserved to be a standalone program then it would need debuggable output and some way of passing its output between processes that could later be examined in case we discovered an error in our output further downstream.
Is it considered bad practice (in cases like the question linked above) to use intermediate files? Why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
多线程处理时会出现中间文件的一个问题。
如果客户端 C1 和 C2 由服务器进程 S 同时处理(它可能或没有分叉成单独的进程、使用的线程或任何并发系统..),当两者都尝试创建相同的中间文件时,您可能会遇到奇怪的问题。
我相信 Unix 哲学之一是所有程序都应该充当过滤器,但这并不一定意味着在磁盘上创建文件,并且在我看来,使用中间文件会导致笨拙的行为。人们还应该将磁盘视为最后的手段,仅将其用于存储/检索关闭计算机电源后可用的数据,甚至可能注意允许程序在只读介质上运行。
One problem with intermediate files happens when multi-threading.
If Clients C1 and C2 are handled simultaneously by server process S (which may or not have forked into seperate processes, used threads, or whatever concurrency system..), you may get weird issues when both try to create the same intermediate file.
I believe one of Unix philosophies is that all programs should act as filters, however this doesn't necessarily mean creating files on the disk, and using intermediate files leads to unwieldly behaviour in my opinion. One should also treat the disk as a last resort and only use it for storing/retrieving data that should be available after powering off the computer, and maybe even take care to allow programs to run on read-only media.
那么,在使用文件时会遇到一些问题,特别是在访问或创建文件时可能会出现许多意外的失败。以下列出的都是我个人遇到的问题。
1) 文件位置位于远程计算机上并且网络已关闭。 (NFS 安装)。
2) 创建文件时没有足够的可用空间。
3) 在进程之间,用户按 Ctrl-C 取消进程,文件不会被删除。
4)文件挂载在NFS上,网络慢
5) 创建文件的文件夹是软链接,原链接被删除。
但我们仍然必须使用 file,因为在 bash 中工作时几乎没有任何选项。但在 C、C++ 中,我认为磁盘访问应该被视为最后的手段。如果这是与用户通信的唯一方式,程序生成文件作为输出是可以的。但至少对于中间节省来说,应该尽量减少磁盘文件的使用。
Well, there are some issues when you use files, especially there may be many unexpected failures while accessing or creating the files. The following listed are all the issues that I personally have experienced.
1) The file location is on the remote machine and the network is down. (NFS mounted).
2) There is not enough free space while creating the file.
3) In between the process the user press Ctrl-C to cancel the process the file is not deleted.
4) The file is mounted on the NFS and the network is slow.
5) The folder in which file was created was a soft link and the original link was deleted.
But still we have to use file because there are hardly any options while working in bash. But in C,C++ i think disk access should be considered as the last resort. Program producing files as output is ok, if that is the only way to communicate with the user. But atleast for intermediate savings use of disk files should be minimized.
如果您正确创建临时文件(设置特定于平台的“临时”标志意味着在没有紧急需要时不要将缓存刷新到磁盘),如果任务需要它们,它们就非常好。
IT 领域几乎没有什么东西是你有充分理由不能使用的。 :-)
If you create temporary files properly (with setting platform-specific 'temporary' flag meaning do not flush cache to disk when no urgent need) they are perfectly good if task requires them.
There are almost no things in IT that you can't use while having a good reason to. :-)