使用 ZipInputStreams 和 ZipOutpuStreams 时如何避免 Scala 中的可变变量?
我正在尝试读取一个 zip 文件,检查它是否包含一些必需的文件,然后将所有有效文件写入另一个 zip 文件。 java.util.zip 的基本介绍 有很多 Java 主义我很想让我的代码更加 Scala 原生。具体来说,我想避免使用vars
。这就是我所拥有的:
val fos = new FileOutputStream("new.zip");
val zipOut = new ZipOutputStream(new BufferedOutputStream(fos));
while (zipIn.available == 1) {
val entry = zipIn.getNextEntry
if (entryIsValid(entry)) {
zipOut.putNewEntry(new ZipEntry("subdir/" + entry.getName())
// read data into the data Array
var data = Array[Byte](1024)
var count = zipIn.read(data, 0, 1024)
while (count != -1) {
zipOut.write(data, 0, count)
count = zipIn.read(data, 0, 1024)
}
}
zipIn.close
}
zipOut.close
我应该补充一点,我正在使用 Scala 2.7.7。
I'm trying to read a zip file, check that it has some required files, and then write all valid files out to another zip file. The basic introduction to java.util.zip has a lot of Java-isms and I'd love to make my code more Scala-native. Specifically, I'd like to avoid the use of vars
. Here's what I have:
val fos = new FileOutputStream("new.zip");
val zipOut = new ZipOutputStream(new BufferedOutputStream(fos));
while (zipIn.available == 1) {
val entry = zipIn.getNextEntry
if (entryIsValid(entry)) {
zipOut.putNewEntry(new ZipEntry("subdir/" + entry.getName())
// read data into the data Array
var data = Array[Byte](1024)
var count = zipIn.read(data, 0, 1024)
while (count != -1) {
zipOut.write(data, 0, count)
count = zipIn.read(data, 0, 1024)
}
}
zipIn.close
}
zipOut.close
I should add that I'm using Scala 2.7.7.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我不认为使用 Java 类有什么特别的错误,这些类被设计为按照它们设计的方式以命令式方式工作。惯用的 Scala 包括能够按照预期使用惯用的 Java,即使样式确实有点冲突。
但是,如果您想要(也许作为练习,或者可能因为它确实稍微澄清了逻辑)以更实用的无变量方式执行此操作,您可以这样做。在2.8中,它特别好,所以即使你使用2.7.7,我也会给出2.8的答案。
首先,我们需要设置问题,但您没有完全做到这一点,但假设我们有这样的问题:
现在,鉴于此,我们要复制 zip 文件。我们可以使用的技巧是 Collection.immutable.Stream 中的 Continuously 方法。它的作用是为您执行一个延迟评估循环。然后,您可以获取并过滤结果以终止并处理您想要的内容。当您想要将某些东西作为迭代器但事实并非如此时,这是一种方便使用的模式。 (如果该项目自行更新,您可以在
Iterable
或Iterator
中使用.iterate
- 这通常会更好。)这是这种情况的应用程序,使用两次:一次用于获取条目,一次用于读取/写入数据块:请密切注意某些行末尾的
.
!我通常会把它写成一长行,但最好将其换行,这样您就可以在这里看到所有内容。以防万一不清楚,让我们来解开
Continously
的用途之一。这要求根据需要多次调用 zipIn.read(buffer) ,并存储结果的整数。
这指定了需要多少次,返回一个不定长度的流,但当它遇到
-1
时将退出。这会处理流,依次获取每个项目(计数),并使用它来写入缓冲区。这以一种有点偷偷摸摸的方式工作,因为您依赖于刚刚调用
zipIn
来获取流的下一个元素的事实 - 如果您尝试再次执行此操作,而不是单次传递通过流,它会失败,因为buffer
会被覆盖。但这里没关系。所以,这就是:一个稍微更紧凑、可能更容易理解、可能不太容易理解但功能更强大的方法(尽管仍然存在大量副作用)。相比之下,在 2.7.7 中,我实际上会以 Java 方式执行此操作,因为
Stream.continually
不可用,并且构建自定义Iterator
的开销也不存在为了这个案子值得。 (但是,如果我要做更多的 zip 文件处理并且可以重用代码,那将是值得的。)编辑:寻找可用到归零的方法对于检测 zip 文件的结尾有点不稳定。 zip 文件。我认为“正确”的方法是等到从
getNextEntry
返回null
为止。考虑到这一点,我编辑了之前的代码(有一个takeWhile(_ => zipIn.available==1)
现在是一个takeWhile(_ != null)
)并在下面提供了一个基于 2.7.7 迭代器的版本(请注意,一旦完成定义迭代器的工作,主循环有多小,这确实使用了 vars):dI don't think there's anything particularly wrong with using Java classes that are designed to work in imperative fashion in the fashion they were designed. Idiomatic Scala includes being able to use idiomatic Java as it was intended, even if the styles do clash a bit.
However, if you want--perhaps as an exercise, or perhaps because it does slightly clarify the logic--to do this in a more functional var-free way, you can do so. In 2.8, it's particularly nice, so even though you're using 2.7.7, I'll give a 2.8 answer.
First, we need to set up the problem, which you didn't entirely, but let's suppose we have something like this:
Now, given this we want to copy the zip file. The trick we can use is the
continually
method incollection.immutable.Stream
. What it does is perform a lazily-evaluated loop for you. You can then take and filter the results to terminate and process what you want. It's a handy pattern to use when you have something that you want to be an iterator, but it isn't. (If the item updates itself you can use.iterate
inIterable
orIterator
--that's usually even better.) Here's the application to this case, used twice: once to get the entries, and once to read/write chunks of data:Pay close attention to the
.
at the end of some lines! I would normally write this on one long line, but it's nicer to have it wrap so you can see it all here.Just in case it isn't clear, let's unpack one of the uses of
continually
.This asks to keep calling
zipIn.read(buffer)
for as many times as necessary, storing the integer that results.This specifies how many times are necessary, returning a stream of indefinite length but which will quit when it hits a
-1
.This processes the stream, taking each item in turn (the count), and using it to write the buffer. This works in a slightly sneaky way, since you rely upon the fact that
zipIn
has just been called to get the next element of the stream--if you tried to do this again, not on a single pass through the stream, it would fail becausebuffer
would be overwritten. But here it's okay.So, there it is: a slightly more compact, possibly easier to understand, possibly less easy to understand method that is more functional (though there are still side-effects galore). In 2.7.7, in contrast, I would actually do it the Java way because
Stream.continually
isn't available, and the overhead of building a customIterator
isn't worth it for this one case. (It would be worth it if I was going to do more zip file processing and could reuse the code, however.)Edit: The looking-for-available-to-go-zero method is kind of flaky for detecting the end of the zip file. I think the "correct" way is to wait until you get a
null
back fromgetNextEntry
. With that in mind, I've edited the previous code (there was atakeWhile(_ => zipIn.available==1)
that is now atakeWhile(_ != null)
) and provided a 2.7.7 iterator based version below (note how small the main loop is, once you get through the work of defining the iterators, which do admittedly use vars):使用scala2.8和尾递归调用:
Using scala2.8 and tail recursive call :
我会尝试这样的事情(是的,几乎与 sblundy 的想法相同):
它可以像下面这样简化,但我不太喜欢它。我希望
read
不能返回 0...I'd try something like this (yes, pretty much the same idea sblundy had):
It could be simplified like below, but I'm not very fond of it. I'd prefer for
read
not to be able to return 0...基于 http://harrah.github .io/browse/samples/compiler/scala/tools/nsc/io/ZipArchive.scala.html:
Based on http://harrah.github.io/browse/samples/compiler/scala/tools/nsc/io/ZipArchive.scala.html:
如果没有尾递归,我会避免递归。您将面临堆栈溢出的风险。您可以将 zipIn.read(data) 包装在 scala.BufferedIterator[Byte] 中,然后从那里开始。
Without tail-recursion, I'd avoid recursion. You would run the risk to a stack overflow. You could wrap
zipIn.read(data)
in anscala.BufferedIterator[Byte]
and go from there.