如何在Java中有效地管理文件系统上的文件?

发布于 2024-09-05 23:40:02 字数 330 浏览 1 评论 0原文

我正在创建一些 JAX-WS 端点,我想为其保存接收和发送的消息以供以后检查。为此,我计划将消息(XML 文件)保存到文件系统中,采用某种合理的层次结构。每天会有数百甚至数千个文件。我还需要存储每个文件的元数据。

我正在考虑将元数据(只是几个字段)放入数据库表中,但将 XML 文件内容本身放入文件系统中的文件中,以免内容数据(很少读取)使数据库膨胀。

是否有一些简单的库可以帮助我保存、加载、删除等文件?自己实现并不是那么棘手,但我想知道是否有现有的解决方案?只是一个简单的库,已经提供了对文件系统的轻松访问(最好是在不同的操作系统上)。

或者我是否需要它,我应该使用原始/自定义 Java 吗?

I am creating a few JAX-WS endpoints, for which I want to save the received and sent messages for later inspection. To do this, I am planning to save the messages (XML files) into filesystem, in some sensible hierarchy. There will be hundreds, even thousands of files per day. I also need to store metadata for each file.

I am considering to put the metadata (just a couple of fields) into database table, but the XML file content itself into files in a filesystem in order not to bloat the database with content data (that is seldomly read).

Is there some simple library that helps me in saving, loading, deleting etc. the files? It's not that tricky to implement it myself, but I wonder if there are existing solutions? Just a simple library that already provides easy access to filesystem (preferrably over different operating systems).

Or do I even need that, should I just go with raw/custom Java?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

无声无音无过去 2024-09-12 23:40:02

有没有一些简单的库可以
帮助我保存、加载、删除
等文件?没那么棘手
自己实现它,但我想知道
是否有现有的解决方案?只是
一个已经提供的简单库
轻松访问文件系统(最好
在不同的操作系统上)。

Java API

好吧,如果您需要做的事情非常简单,那么您应该能够通过 java.io.File (删除、检查存在、读取、写入等)以及一些使用 FileInputStreamFileOutputStream

您还可以引入 Apache commons-io 及其方便的 FileUtils 用于更多实用功能。

Java 独立于操作系统。您只需要确保使用 File.pathSeparator ,或使用构造函数 File(File Parent, String child) ,这样您就不需要显式提及分隔符。

Java 文件 API 相对较高级别,抽象了许多操作系统的差异。大多数时候就足够了。仅当您需要 API 中没有的一些相对特定于操作系统的功能时,它才会有一些缺点,例如检查磁盘上文件的物理大小(而不是逻辑大小)、*nix 上的安全权限、可用空间/配额 大多数操作系统

都有一个用于文件写入/读取的内部缓冲区。使用 FileOutputStream.writeFileOutputStream.flush 确保数据已发送到操作系统,但不必写入磁盘。 Java API 还支持这种低级集成来管理这些缓冲问题(示例 此处) 用于数据库等系统。

另外,文件和目录都是用 File 抽象的,您需要用 isDirectory 检查。这可能会令人困惑,例如,如果您有一个文件 x 和一个目录 /x (我不记得具体如何处理这个问题,但有一个方式)。

Web 服务

Web 服务可以使用 xs:base64Binary 传递数据,也可以使用 MTOM(消息传输优化机制)如果文件很大。

事务

请注意,数据库是事务性的,而文件系统不是。因此,如果操作失败并重试,您可能需要添加一些检查。

您可以采用涉及某种形式的分布式事务的复杂设计(请参阅此答案),或者尝试采用更简单的设计来提供您所需的稳健性级别。一种可能的设计可能是:

  • 更新。如果用户想要覆盖一个文件,您实际上会创建一个新文件。逻辑文件名和物理文件之间的间接级别存储在数据库中。这样,一旦写入,您就永远不会覆盖物理文件,以确保回滚是一致的。
  • 创建。当用户想要创建文件
  • 删除时,情况也是如此。如果用户想要删除文件,只能先在数据库中执行。定期作业轮询文件系统以识别数据库中未列出的文件,并将其删除。这种两阶段删除确保删除操作可以回滚。

这不像在真实事务数据库中写入 BLOB 那样健壮,但提供了一些健壮性。否则你可以看看 commons-transaction,但我觉得这个项目已经死了(2007 )。

Is there some simple library that
helps me in saving, loading, deleting
etc. the files? It's not that tricky
to implement it myself, but I wonder
if there are existing solutions? Just
a simple library that already provides
easy access to filesystem (preferrably
over different operating systems).

Java API

Well, if what you need to do is really simple, you should be able to achieve your goal with java.io.File (delete, check existence, read, write, etc.) and a few stream manipulations with FileInputStream and FileOutputStream.

You can also throw in Apache commons-io and its handy FileUtils for a few more utility functions.

Java is independent of the OS. You just need to make sure you use File.pathSeparator, or use the constructor File(File parent, String child) so that you don't need to explicitly mention the separator.

The Java file API is relatively high-level to abstract the differences of the many OS. Most of the time it's sufficient. It has some shortcomings only if you need some relatively OS-specific feature which is not in the API, e.g. check the physical size of a file on the disk (not the the logical size), security rights on *nix, free space/quota of the hard drive, etc.

Most OS have an internal buffer for file writing/reading. Using FileOutputStream.write and FileOutputStream.flush ensure the data have been sent to the OS, but not necessary written on the disk. The Java API support also this low-level integration to manage these buffering issue (example here) for system such as database.

Also both file and directory are abstracted with File and you need to check with isDirectory. This can be confusing, for instance if you have one file x, and one directory /x (I don't remember exactly how to handle this issue, but there is a way).

Web service

The web service can use either xs:base64Binary to pass the data, or use MTOM (Message Transmission Optimization Mechanism) if files are large.

Transactions

Note that the database is transactional and the file system not. So you might have to add a few checks if operations fails and are re-tried.

You could go with a complicated design involving some form of distributed transaction (see this answer), or try to go with a simpler design that provides the level of robustness that you need. A possible design could be:

  • Update. If the user wants to overwrite a file, you actually create a new one. The level of indirection between the logical file name and the physical file is stored in database. This way you never overwrite a physical file once written, to ensure rollback is consistent.
  • Create. Same story when user want to create a file
  • Delete. If the user want to delete a file, you do it only in database first. A periodic job polls the file system to identify files which are not listed in database, and removes them. This two-phase deletes ensures that the delete operation can be rolled back.

This is not as robust as writting BLOB in real transactional database, but provide some robustness. You could otherwise have a look at commons-transaction, but I feel like the project is dead (2007).

溇涏 2024-09-12 23:40:02

DataNucleus,一个 Java 持久性提供程序。对于这种情况来说,它有点太重了,但它支持具有不同数据存储(RDBMS、对象存储、XML、JSON、Excel 等)的 JPA 和 JDO java 标准。如果产品已经在使用 JPA 或 JDO,则可能值得考虑使用 NataNucleus,因为将数据保存到不同的数据存储中应该是透明的。我认为 DataNucleus 支持将数据拆分为多个文件,创建我想要的合理目录/文件结构(在我的问题中),但这只是一个猜测。

对 XML 和 JSON 的支持似乎是实验性的。

There is DataNucleus, a Java persistence provider. It is little too heavy for this case, but it supports JPA and JDO java standards with different datastores (RDBMS, object storage, XML, JSON, Excel, etc.). If the product is already using JPA or JDO, it might be worth considering using NataNucleus, as saving data into different datastores should be transparent. I suppose DataNucleus supports splitting the data into several files, creating the sensible directory/file structure I wanted (in my question), but this is just a guess.

Support for XML and JSON seems to be experimental.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文