使用进程 ID 和线程 ID 命名目录
我有一个应用程序,它有几个线程来操作数据并将输出保存在 Linux 或 Windows 计算机中特定目录的不同临时文件中。这些文件最终需要被删除。
我想做的是能够更好地分离文件,所以我正在考虑通过进程ID和线程ID来做到这一点。这将有助于应用程序节省磁盘空间,因为在线程终止时,可以擦除包含该线程文件的整个目录,并让应用程序的其余部分重新使用相应的磁盘空间。
由于应用程序在 JVM 的单个实例上运行,我假设它将有一个进程 ID,这将是 JVM 的进程 ID,对吧?
既然如此,区分这些文件的唯一方法就是将它们保存在一个文件夹中,该文件夹的名称将与线程 ID 相关。
这种方法合理吗?还是我应该做其他事情?
I have an application with a few threads that manipulate data and save the output in different temporary files on a particular directory, in a Linux or a Windows machine. These files eventually need to be erased.
What I want to do is to be able to better separate the files, so I am thinking of doing this by Process ID and Thread ID. This will help the application save disk space because, upon termination of a thread, the whole directory with that thread's files can be erased and leave the rest of the application reuse the corresponding disk space.
Since the application runs on a single instance of the JVM, I assume it will have a single Process ID, which will be that of the JVM, right?
That being the case, the only way to discriminate among these files is to save them in a folder, the name of which will be related to the Thread ID.
Is this approach reasonable, or should I be doing something else?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
java.io.File 可以为您创建临时文件。只要保留与每个线程关联的文件的列表,就可以在线程退出时删除它们。您还可以将文件标记为 退出时删除以防线程未完成。
java.io.File can create temporary files for you. As long as you keep a list of those files associated with each thread, you can delete them when the thread exits. You can also mark the files to delete on exit in case a thread does not complete.
看来这种方法最简单的解决方案实际上是扩展 Thread - 从未想过我会看到这一天。
正如 PT 已经说过的,只要线程处于活动状态,线程 ID 才是唯一的,它们可以而且肯定会被操作系统重用。
因此,您不必这样做,而是使用可以在构造时指定的线程名称,并且为了简单起见,只需编写一个小类:
然后您可以执行如下操作:
您必须覆盖要使用的所有构造函数,并且始终使用 MyThread 类,但这样您就可以保证唯一的映射 - 至少 2^64-1 (毕竟负值也可以),这应该足够了。
尽管我仍然不认为这是最好的方法,可能更好地创建一些“作业”类,其中包含所有必要的信息,并且可以在不再需要时立即清理其文件 - 这样你就可以还可以轻松使用 ThreadPools 和 co,其中一个线程可以完成多项工作。目前,您在线程中拥有业务逻辑 - 这对我来说并不是特别好的设计。
It seems the simplest solution for this approach is really to extend Thread - never thought I'd see that day.
As P.T. already said Thread IDs are only unique as long as the thread is alive, they can and most certainly will be reused by the OS.
So instead of doing it this way, you use the Thread name that can be specified at construction and to make it simple, just write a small class:
Then you can do something like this:
You have to overwrite all constructors you want to use and always use the
MyThread
class, but this way you can guarantee a unique mapping - well at least 2^64-1 (negative values are fine too after all) which should be more than enough.Though I still don't think that's the best approach, possibly better to create some "job" class that contains all necessary information and can clean up its files as soon as it's no longer needed - that way you also can easily use ThreadPools and co where one thread will do more than one job. At the moment you have business logic in a thread - that doesn't strike me as especially good design.
你是对的,JVM 有一个进程 ID,并且该 JVM 中的所有线程都将共享该进程 ID。 (JVM 可以使用多个进程,但据我所知,没有 JVM 会这样做。)
JVM 很可能会为多个 Java 线程重用底层操作系统线程,因此在 Java 中退出的线程与在操作系统级别发生类似的情况。
如果您只需要清理过时的文件,按文件的创建时间戳对文件进行排序应该足以完成这项工作吗?无需在临时文件名中对任何特殊内容进行编码。
请注意,PID 和 TID 既不能保证不断增加,也不能保证在出口之间是唯一的。操作系统可以自由回收ID。 (实际上,ID 必须在重新使用之前回绕,但在某些计算机上,仅创建 32k 或 64k 进程后可能会发生这种情况。
You're correct, the JVM has one process ID, and all threads in that JVM will share the process id. (It is possible for a JVM to use multiple processes, but AFAIK, no JVM does that.)
A JVM may very well re-use underlying OS threads for multiple Java threads, so there is no guaranteed correlation between a thread exiting in Java and anything similar happening at the OS level.
If you just need to cleanup stale files, sorting the files by their creation timestamp should do the job sufficiently? No need to encode anything special at all in the temporary file names.
Note that PIDs and TIDs are neither guaranteed to be increasing, no guaranteed to be unique across exits. The OS is free to recycle an ID. (In practice the IDs have to wrap around before re-use, but on some machines that can happen after only 32k or 64k processes have been created.