如何在Java中有效地管理文件系统上的文件?
我正在创建一些 JAX-WS 端点,我想为其保存接收和发送的消息以供以后检查。为此,我计划将消息(XML 文件)保存到文件系统中,采用某种合理的层次结构。每天会有数百甚至数千个文件。我还需要存储每个文件的元数据。
我正在考虑将元数据(只是几个字段)放入数据库表中,但将 XML 文件内容本身放入文件系统中的文件中,以免内容数据(很少读取)使数据库膨胀。
是否有一些简单的库可以帮助我保存、加载、删除等文件?自己实现并不是那么棘手,但我想知道是否有现有的解决方案?只是一个简单的库,已经提供了对文件系统的轻松访问(最好是在不同的操作系统上)。
或者我是否需要它,我应该使用原始/自定义 Java 吗?
I am creating a few JAX-WS endpoints, for which I want to save the received and sent messages for later inspection. To do this, I am planning to save the messages (XML files) into filesystem, in some sensible hierarchy. There will be hundreds, even thousands of files per day. I also need to store metadata for each file.
I am considering to put the metadata (just a couple of fields) into database table, but the XML file content itself into files in a filesystem in order not to bloat the database with content data (that is seldomly read).
Is there some simple library that helps me in saving, loading, deleting etc. the files? It's not that tricky to implement it myself, but I wonder if there are existing solutions? Just a simple library that already provides easy access to filesystem (preferrably over different operating systems).
Or do I even need that, should I just go with raw/custom Java?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Java API
好吧,如果您需要做的事情非常简单,那么您应该能够通过 java.io.File (删除、检查存在、读取、写入等)以及一些使用 FileInputStream 和 FileOutputStream。
您还可以引入 Apache commons-io 及其方便的 FileUtils 用于更多实用功能。
Java 独立于操作系统。您只需要确保使用
File.pathSeparator
,或使用构造函数File(File Parent, String child)
,这样您就不需要显式提及分隔符。Java 文件 API 相对较高级别,抽象了许多操作系统的差异。大多数时候就足够了。仅当您需要 API 中没有的一些相对特定于操作系统的功能时,它才会有一些缺点,例如检查磁盘上文件的物理大小(而不是逻辑大小)、*nix 上的安全权限、可用空间/配额 大多数操作系统
都有一个用于文件写入/读取的内部缓冲区。使用
FileOutputStream.write
和FileOutputStream.flush
确保数据已发送到操作系统,但不必写入磁盘。 Java API 还支持这种低级集成来管理这些缓冲问题(示例 此处) 用于数据库等系统。另外,文件和目录都是用
File
抽象的,您需要用isDirectory
检查。这可能会令人困惑,例如,如果您有一个文件x
和一个目录/x
(我不记得具体如何处理这个问题,但有一个方式)。Web 服务
Web 服务可以使用
xs:base64Binary
传递数据,也可以使用 MTOM(消息传输优化机制)如果文件很大。事务
请注意,数据库是事务性的,而文件系统不是。因此,如果操作失败并重试,您可能需要添加一些检查。
您可以采用涉及某种形式的分布式事务的复杂设计(请参阅此答案),或者尝试采用更简单的设计来提供您所需的稳健性级别。一种可能的设计可能是:
这不像在真实事务数据库中写入 BLOB 那样健壮,但提供了一些健壮性。否则你可以看看 commons-transaction,但我觉得这个项目已经死了(2007 )。
Java API
Well, if what you need to do is really simple, you should be able to achieve your goal with java.io.File (delete, check existence, read, write, etc.) and a few stream manipulations with FileInputStream and FileOutputStream.
You can also throw in Apache commons-io and its handy FileUtils for a few more utility functions.
Java is independent of the OS. You just need to make sure you use
File.pathSeparator
, or use the constructorFile(File parent, String child)
so that you don't need to explicitly mention the separator.The Java file API is relatively high-level to abstract the differences of the many OS. Most of the time it's sufficient. It has some shortcomings only if you need some relatively OS-specific feature which is not in the API, e.g. check the physical size of a file on the disk (not the the logical size), security rights on *nix, free space/quota of the hard drive, etc.
Most OS have an internal buffer for file writing/reading. Using
FileOutputStream.write
andFileOutputStream.flush
ensure the data have been sent to the OS, but not necessary written on the disk. The Java API support also this low-level integration to manage these buffering issue (example here) for system such as database.Also both file and directory are abstracted with
File
and you need to check withisDirectory
. This can be confusing, for instance if you have one filex
, and one directory/x
(I don't remember exactly how to handle this issue, but there is a way).Web service
The web service can use either
xs:base64Binary
to pass the data, or use MTOM (Message Transmission Optimization Mechanism) if files are large.Transactions
Note that the database is transactional and the file system not. So you might have to add a few checks if operations fails and are re-tried.
You could go with a complicated design involving some form of distributed transaction (see this answer), or try to go with a simpler design that provides the level of robustness that you need. A possible design could be:
This is not as robust as writting BLOB in real transactional database, but provide some robustness. You could otherwise have a look at commons-transaction, but I feel like the project is dead (2007).
有 DataNucleus,一个 Java 持久性提供程序。对于这种情况来说,它有点太重了,但它支持具有不同数据存储(RDBMS、对象存储、XML、JSON、Excel 等)的 JPA 和 JDO java 标准。如果产品已经在使用 JPA 或 JDO,则可能值得考虑使用 NataNucleus,因为将数据保存到不同的数据存储中应该是透明的。我认为 DataNucleus 支持将数据拆分为多个文件,创建我想要的合理目录/文件结构(在我的问题中),但这只是一个猜测。
对 XML 和 JSON 的支持似乎是实验性的。
There is DataNucleus, a Java persistence provider. It is little too heavy for this case, but it supports JPA and JDO java standards with different datastores (RDBMS, object storage, XML, JSON, Excel, etc.). If the product is already using JPA or JDO, it might be worth considering using NataNucleus, as saving data into different datastores should be transparent. I suppose DataNucleus supports splitting the data into several files, creating the sensible directory/file structure I wanted (in my question), but this is just a guess.
Support for XML and JSON seems to be experimental.