良好的设计:如何传递 InputStreams 作为参数?

发布于 2024-07-11 11:11:11 字数 553 浏览 5 评论 0原文

我有一个大文件,我正在其中打开 FileInputStream。 该文件包含一些文件,每个文件都有距开头的偏移量和大小。 此外,我有一个解析器应该评估这样一个包含的文件。

File file = ...; // the big file
long offset = 1734; // a contained file's offset
long size = 256; // a contained file's size
FileInputStream fis = new FileInputStream(file );
fis.skip(offset);
parse(fis, size);

public void parse(InputStream is, long size) {
   // parse stream data and insure we don't read more than size bytes
   is.close();
}

我觉得这不是一个好的做法。 有没有更好的方法来做到这一点,也许使用缓冲?

此外,我觉得skip()方法会大大减慢阅读过程。

I've got a big file on which I'm opening a FileInputStream. This file contains some files each having an offset from the beginning and a size. Furthermore, I've got a parser that should evaluate such a contained file.

File file = ...; // the big file
long offset = 1734; // a contained file's offset
long size = 256; // a contained file's size
FileInputStream fis = new FileInputStream(file );
fis.skip(offset);
parse(fis, size);

public void parse(InputStream is, long size) {
   // parse stream data and insure we don't read more than size bytes
   is.close();
}

I feel like this is no good practice. Is there a better way to do this, maybe using buffering?

Furthermore, I feel like the skip() method slows the reading process a lot.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

幸福还没到 2024-07-18 11:11:11

听起来您真正想要的是一种“部分”输入流 - 有点像 ZipInputStream,其中您在流中拥有一个流。

您可以自己编写此代码,将所有 InputStream 方法代理到原始输入流,对偏移量进行适当的调整并检查是否读取超过子文件的末尾。

你说的是这种事吗?

It sounds like what you really want is a sort of "partial" input stream - one a bit like the ZipInputStream, where you've got a stream within a stream.

You could write this yourself, proxying all InputStream methods to the original input stream making suitable adjustments for offset and checking for reading past the end of the subfile.

Is that the sort of thing you're talking about?

毁我热情 2024-07-18 11:11:11

首先,FileInputStream.skip() 有一个错误,可能会使下面的文件跳过文件的 EOF 标记,因此请小心该标记。

我个人发现与使用 FileReader 和 FileWriter 相比,使用 Input/OutputStreams 很痛苦,并且您展示了我对它们的主要问题:使用后需要关闭流。 问题之一是,您永远无法确定是否已正确关闭所有资源,除非您使代码有点过于谨慎,如下所示:

public void parse(File in, long size) {
    try {
        FileInputStream fis = new FileInputStream(in);
        // do file content handling here
    } finally {
        fis.close();
    }
    // do parsing here
}

这当然是不好的,因为这会导致创建所有新对象最终可能会消耗大量资源的时间。 当然,这样做的好处是,即使文件处理代码抛出异常,流也会关闭。

First, FileInputStream.skip() has a bug which may make the file underneath skip beyond the EOF marker of the file so be wary of that one.

I've personally found working with Input/OutputStreams to be a pain compared to using FileReader and FileWriter and you're showing the main issue I have with them: The need to close the streams after using. One of the issues is that you can never be sure if you've closed up all the resources properly unless you make the code a bit too cautious like this:

public void parse(File in, long size) {
    try {
        FileInputStream fis = new FileInputStream(in);
        // do file content handling here
    } finally {
        fis.close();
    }
    // do parsing here
}

This is of course bad in the sense that this would lead to creating new objects all the time which may end up eating a lot of resources. The good side of this is of course that the stream will get closed even if the file handling code throws an exception.

最偏执的依靠 2024-07-18 11:11:11

这听起来像是一个典型的嵌套文件,也称为“zip”文件问题。

处理此问题的常见方法是为每个嵌套逻辑流实际上拥有一个单独的 InputStream 实例。 这些将在底层物理流上执行必要的操作,并且缓冲可以在底层流和逻辑流上,具体取决于哪一个最适合。 这意味着逻辑流封装了有关底层流中放置的所有信息。

例如,您可以拥有一种具有如下签名的工厂方法:

List<InputStream> getStreams(File inputFile)

您可以对 OutputStreams 执行相同的操作。

这有一些细节,但这对您来说可能就足够了?

This sounds like a typical nested file aka "zip" file problem.

A common way to handle this is to actually have a separate InputStream instance for each nested logical stream. These would perform the necessary operations on the underlying phsycial stream, and buffering can be both on the underlying stream and the logical stream, depending on which suits best. This means the logical stream encapsulates all the information about placement in the underlying stream.

You could forinstance have a kind of factory method that would have a signature like this:

List<InputStream> getStreams(File inputFile)

You could do the same with OutputStreams.

There are some details to this, but this may be enough for you ?

寻找我们的幸福 2024-07-18 11:11:11

一般来说,打开文件的代码应该关闭文件——parse()函数不应该关闭输入流,因为它假设程序的其余部分不想继续,这是非常傲慢的读取大文件中包含的其他文件。

您应该决定 parse() 的接口是否应该只是流和长度(函数能够假设文件位置正确)或者接口是否应该包含偏移量(因此函数首先定位然后读取)。 两种设计都是可行的。 我倾向于让 parse() 进行定位,但这不是一个明确的决定。

In general, the code that opens the file should close the file -- the parse() function should not close the input stream, since it is of the utmost arrogance for it to assume that the rest of the program won't want to continue reading other files contained in the big one.

You should decide whether the interface to parse() should be just stream and length (with the function able to assume that the file is correctly positioned) or whether the interface should include the offset (so the function first positions and then reads). Both designs are feasible. I'd be inclined to let the parse() do the positioning, but it is not a clear-cut decision.

各自安好 2024-07-18 11:11:11

您可以在 RandomAccessFile 上使用包装类 - 尝试 this

您还可以尝试将其包装在 BufferedInputStream 中,看看性能是否有所提高。

You could use a wrapper class on a RandomAccessFile - try this

You could also try wrapping that in a BufferedInputStream and see if the performance improves.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文