使用 ZipInputStreams 和 ZipOutpuStreams 时如何避免 Scala 中的可变变量?

发布于 2024-09-02 06:58:47 字数 835 浏览 6 评论 0原文

我正在尝试读取一个 zip 文件,检查它是否包含一些必需的文件,然后将所有有效文件写入另一个 zip 文件。 java.util.zip 的基本介绍 有很多 Java 主义我很想让我的代码更加 Scala 原生。具体来说,我想避免使用vars。这就是我所拥有的:

val fos = new FileOutputStream("new.zip");
val zipOut = new ZipOutputStream(new BufferedOutputStream(fos));

while (zipIn.available == 1) {
  val entry = zipIn.getNextEntry
  if (entryIsValid(entry)) {
    zipOut.putNewEntry(new ZipEntry("subdir/" + entry.getName())
    // read data into the data Array
    var data = Array[Byte](1024)
    var count = zipIn.read(data, 0, 1024)
    while (count != -1) {
      zipOut.write(data, 0, count)
      count = zipIn.read(data, 0, 1024)
    }
  }
  zipIn.close
}
zipOut.close

我应该补充一点,我正在使用 Scala 2.7.7。

I'm trying to read a zip file, check that it has some required files, and then write all valid files out to another zip file. The basic introduction to java.util.zip has a lot of Java-isms and I'd love to make my code more Scala-native. Specifically, I'd like to avoid the use of vars. Here's what I have:

val fos = new FileOutputStream("new.zip");
val zipOut = new ZipOutputStream(new BufferedOutputStream(fos));

while (zipIn.available == 1) {
  val entry = zipIn.getNextEntry
  if (entryIsValid(entry)) {
    zipOut.putNewEntry(new ZipEntry("subdir/" + entry.getName())
    // read data into the data Array
    var data = Array[Byte](1024)
    var count = zipIn.read(data, 0, 1024)
    while (count != -1) {
      zipOut.write(data, 0, count)
      count = zipIn.read(data, 0, 1024)
    }
  }
  zipIn.close
}
zipOut.close

I should add that I'm using Scala 2.7.7.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

七秒鱼° 2024-09-09 06:58:47

我不认为使用 Java 类有什么特别的错误,这些类被设计为按照它们设计的方式以命令式方式工作。惯用的 Scala 包括能够按照预期使用惯用的 Java,即使样式确实有点冲突。

但是,如果您想要(也许作为练习,或者可能因为它确实稍微澄清了逻辑)以更实用的无变量方式执行此操作,您可以这样做。在2.8中,它特别好,所以即使你使用2.7.7,我也会给出2.8的答案。

首先,我们需要设置问题,但您没有完全做到这一点,但假设我们有这样的问题:

import java.io._
import java.util.zip._
import scala.collection.immutable.Stream

val fos = new FileOutputStream("new.zip")
val zipOut = new ZipOutputStream(new BufferedOutputStream(fos))
val zipIn = new ZipInputStream(new FileInputStream("old.zip"))
def entryIsValid(ze: ZipEntry) = !ze.isDirectory

现在,鉴于此,我们要复制 zip 文件。我们可以使用的技巧是 Collection.immutable.Stream 中的 Continuously 方法。它的作用是为您执行一个延迟评估循环。然后,您可以获取并过滤结果以终止并处理您想要的内容。当您想要将某些东西作为迭代器但事实并非如此时,这是一种方便使用的模式。 (如果该项目自行更新,您可以在 IterableIterator 中使用 .iterate - 这通常会更好。)这是这种情况的应用程序,使用两次:一次用于获取条目,一次用于读取/写入数据块:

val buffer = new Array[Byte](1024)
Stream.continually(zipIn.getNextEntry).
  takeWhile(_ != null).filter(entryIsValid).
  foreach(entry => {
    zipOut.putNextEntry(new ZipEntry("subdir/"+entry.getName))
    Stream.continually(zipIn.read(buffer)).takeWhile(_ != -1).
      foreach(count => zipOut.write(buffer,0,count))
  })
}
zipIn.close
zipOut.close

请密切注意某些行末尾的 .!我通常会把它写成一长行,但最好将其换行,这样您就可以在这里看到所有内容。

以防万一不清楚,让我们来解开 Continously 的用途之一。

Stream.continually(zipIn.read(buffer))

这要求根据需要多次调用 zipIn.read(buffer) ,并存储结果的整数。

.takeWhile(_ != -1)

这指定了需要多少次,返回一个不定长度的流,但当它遇到 -1 时将退出。

.foreach(count => zipOut.write(buffer,0,count))

这会处理流,依次获取每个项目(计数),并使用它来写入缓冲区。这以一种有点偷偷摸摸的方式工作,因为您依赖于刚刚调用 zipIn 来获取流的下一个元素的事实 - 如果您尝试再次执行此操作,而不是单次传递通过流,它会失败,因为 buffer 会被覆盖。但这里没关系。

所以,这就是:一个稍微更紧凑、可能更容易理解、可能不太容易理解但功能更强大的方法(尽管仍然存在大量副作用)。相比之下,在 2.7.7 中,我实际上会以 Java 方式执行此操作,因为 Stream.continually 不可用,并且构建自定义 Iterator 的开销也不存在为了这个案子值得。 (但是,如果我要做更多的 zip 文件处理并且可以重用代码,那将是值得的。)


编辑:寻找可用到归零的方法对于检测 zip 文件的结尾有点不稳定。 zip 文件。我认为“正确”的方法是等到从 getNextEntry 返回 null 为止。考虑到这一点,我编辑了之前的代码(有一个 takeWhile(_ => zipIn.available==1) 现在是一个 takeWhile(_ != null))并在下面提供了一个基于 2.7.7 迭代器的版本(请注意,一旦完成定义迭代器的工作,主循环有多小,这确实使用了 vars):

val buffer = new Array[Byte](1024)
class ZipIter(zis: ZipInputStream) extends Iterator[ZipEntry] {
  private var entry:ZipEntry = zis.getNextEntry
  private var cached = true
  private def cache { if (entry != null && !cached) {
    cached = true; entry = zis.getNextEntry
  }}
  def hasNext = { cache; entry != null }
  def next = {
    if (!cached) cache
    cached = false
    entry
  }
}
class DataIter(is: InputStream, ab: Array[Byte]) extends Iterator[(Int,Array[Byte])] {
  private var count = 0
  private var waiting = false
  def hasNext = { 
    if (!waiting && count != -1) { count = is.read(ab); waiting=true }
    count != -1
  }
  def next = { waiting=false; (count,ab) }
}
(new ZipIter(zipIn)).filter(entryIsValid).foreach(entry => {
  zipOut.putNextEntry(new ZipEntry("subdir/"+entry.getName))
  (new DataIter(zipIn,buffer)).foreach(cb => zipOut.write(cb._2,0,cb._1))
})
zipIn.close
zipOut.close

dI don't think there's anything particularly wrong with using Java classes that are designed to work in imperative fashion in the fashion they were designed. Idiomatic Scala includes being able to use idiomatic Java as it was intended, even if the styles do clash a bit.

However, if you want--perhaps as an exercise, or perhaps because it does slightly clarify the logic--to do this in a more functional var-free way, you can do so. In 2.8, it's particularly nice, so even though you're using 2.7.7, I'll give a 2.8 answer.

First, we need to set up the problem, which you didn't entirely, but let's suppose we have something like this:

import java.io._
import java.util.zip._
import scala.collection.immutable.Stream

val fos = new FileOutputStream("new.zip")
val zipOut = new ZipOutputStream(new BufferedOutputStream(fos))
val zipIn = new ZipInputStream(new FileInputStream("old.zip"))
def entryIsValid(ze: ZipEntry) = !ze.isDirectory

Now, given this we want to copy the zip file. The trick we can use is the continually method in collection.immutable.Stream. What it does is perform a lazily-evaluated loop for you. You can then take and filter the results to terminate and process what you want. It's a handy pattern to use when you have something that you want to be an iterator, but it isn't. (If the item updates itself you can use .iterate in Iterable or Iterator--that's usually even better.) Here's the application to this case, used twice: once to get the entries, and once to read/write chunks of data:

val buffer = new Array[Byte](1024)
Stream.continually(zipIn.getNextEntry).
  takeWhile(_ != null).filter(entryIsValid).
  foreach(entry => {
    zipOut.putNextEntry(new ZipEntry("subdir/"+entry.getName))
    Stream.continually(zipIn.read(buffer)).takeWhile(_ != -1).
      foreach(count => zipOut.write(buffer,0,count))
  })
}
zipIn.close
zipOut.close

Pay close attention to the . at the end of some lines! I would normally write this on one long line, but it's nicer to have it wrap so you can see it all here.

Just in case it isn't clear, let's unpack one of the uses of continually.

Stream.continually(zipIn.read(buffer))

This asks to keep calling zipIn.read(buffer) for as many times as necessary, storing the integer that results.

.takeWhile(_ != -1)

This specifies how many times are necessary, returning a stream of indefinite length but which will quit when it hits a -1.

.foreach(count => zipOut.write(buffer,0,count))

This processes the stream, taking each item in turn (the count), and using it to write the buffer. This works in a slightly sneaky way, since you rely upon the fact that zipIn has just been called to get the next element of the stream--if you tried to do this again, not on a single pass through the stream, it would fail because buffer would be overwritten. But here it's okay.

So, there it is: a slightly more compact, possibly easier to understand, possibly less easy to understand method that is more functional (though there are still side-effects galore). In 2.7.7, in contrast, I would actually do it the Java way because Stream.continually isn't available, and the overhead of building a custom Iterator isn't worth it for this one case. (It would be worth it if I was going to do more zip file processing and could reuse the code, however.)


Edit: The looking-for-available-to-go-zero method is kind of flaky for detecting the end of the zip file. I think the "correct" way is to wait until you get a null back from getNextEntry. With that in mind, I've edited the previous code (there was a takeWhile(_ => zipIn.available==1) that is now a takeWhile(_ != null)) and provided a 2.7.7 iterator based version below (note how small the main loop is, once you get through the work of defining the iterators, which do admittedly use vars):

val buffer = new Array[Byte](1024)
class ZipIter(zis: ZipInputStream) extends Iterator[ZipEntry] {
  private var entry:ZipEntry = zis.getNextEntry
  private var cached = true
  private def cache { if (entry != null && !cached) {
    cached = true; entry = zis.getNextEntry
  }}
  def hasNext = { cache; entry != null }
  def next = {
    if (!cached) cache
    cached = false
    entry
  }
}
class DataIter(is: InputStream, ab: Array[Byte]) extends Iterator[(Int,Array[Byte])] {
  private var count = 0
  private var waiting = false
  def hasNext = { 
    if (!waiting && count != -1) { count = is.read(ab); waiting=true }
    count != -1
  }
  def next = { waiting=false; (count,ab) }
}
(new ZipIter(zipIn)).filter(entryIsValid).foreach(entry => {
  zipOut.putNextEntry(new ZipEntry("subdir/"+entry.getName))
  (new DataIter(zipIn,buffer)).foreach(cb => zipOut.write(cb._2,0,cb._1))
})
zipIn.close
zipOut.close
俯瞰星空 2024-09-09 06:58:47

使用scala2.8和尾递归调用:

def copyZip(in: ZipInputStream, out: ZipOutputStream, bufferSize: Int = 1024) {
  val data = new Array[Byte](bufferSize)

  def copyEntry() {
    in getNextEntry match {
      case null =>
      case entry => {
        if (entryIsValid(entry)) {
          out.putNextEntry(new ZipEntry("subdir/" + entry.getName()))

          def copyData() {
            in read data match {
              case -1 =>
              case count => {
                out.write(data, 0, count)
                copyData()
              }
            }
          }
          copyData()
        }
        copyEntry()
      }
    }
  }
  copyEntry()
}

Using scala2.8 and tail recursive call :

def copyZip(in: ZipInputStream, out: ZipOutputStream, bufferSize: Int = 1024) {
  val data = new Array[Byte](bufferSize)

  def copyEntry() {
    in getNextEntry match {
      case null =>
      case entry => {
        if (entryIsValid(entry)) {
          out.putNextEntry(new ZipEntry("subdir/" + entry.getName()))

          def copyData() {
            in read data match {
              case -1 =>
              case count => {
                out.write(data, 0, count)
                copyData()
              }
            }
          }
          copyData()
        }
        copyEntry()
      }
    }
  }
  copyEntry()
}
后知后觉 2024-09-09 06:58:47

我会尝试这样的事情(是的,几乎与 sblundy 的想法相同):

Iterator.continually {
  val data = new Array[Byte](100)
  zipIn.read(data) match {
    case -1 => Array.empty[Byte]
    case 0  => new Array[Byte](101) // just to filter it out
    case n  => java.util.Arrays.copyOf(data, n)
  }
} filter (_.size != 101) takeWhile (_.nonEmpty)

它可以像下面这样简化,但我不太喜欢它。我希望 read 不能返回 0...

Iterator.continually {
  val data = new Array[Byte](100)
  zipIn.read(data) match {
    case -1 => new Array[Byte](101)
    case n  => java.util.Arrays.copyOf(data, n)
  }
} takeWhile (_.size != 101)

I'd try something like this (yes, pretty much the same idea sblundy had):

Iterator.continually {
  val data = new Array[Byte](100)
  zipIn.read(data) match {
    case -1 => Array.empty[Byte]
    case 0  => new Array[Byte](101) // just to filter it out
    case n  => java.util.Arrays.copyOf(data, n)
  }
} filter (_.size != 101) takeWhile (_.nonEmpty)

It could be simplified like below, but I'm not very fond of it. I'd prefer for read not to be able to return 0...

Iterator.continually {
  val data = new Array[Byte](100)
  zipIn.read(data) match {
    case -1 => new Array[Byte](101)
    case n  => java.util.Arrays.copyOf(data, n)
  }
} takeWhile (_.size != 101)
半透明的墙 2024-09-09 06:58:47

基于 http://harrah.github .io/browse/samples/compiler/scala/tools/nsc/io/ZipArchive.scala.html

private[io] class ZipEntryTraversableClass(in: InputStream) extends Traversable[ZipEntry] {
  val zis = new ZipInputStream(in)

  def foreach[U](f: ZipEntry => U) {
    @tailrec
    def loop(x: ZipEntry): Unit = if (x != null) {
      f(x)
      zis.closeEntry()
      loop(zis.getNextEntry())
    }
    loop(zis.getNextEntry())
  }

  def writeCurrentEntryTo(os: OutputStream) {
    IOUtils.copy(zis, os)
  }
}

Based on http://harrah.github.io/browse/samples/compiler/scala/tools/nsc/io/ZipArchive.scala.html:

private[io] class ZipEntryTraversableClass(in: InputStream) extends Traversable[ZipEntry] {
  val zis = new ZipInputStream(in)

  def foreach[U](f: ZipEntry => U) {
    @tailrec
    def loop(x: ZipEntry): Unit = if (x != null) {
      f(x)
      zis.closeEntry()
      loop(zis.getNextEntry())
    }
    loop(zis.getNextEntry())
  }

  def writeCurrentEntryTo(os: OutputStream) {
    IOUtils.copy(zis, os)
  }
}
2024-09-09 06:58:47

如果没有尾递归,我会避免递归。您将面临堆栈溢出的风险。您可以将 zipIn.read(data) 包装在 scala.BufferedIterator[Byte] 中,然后从那里开始。

Without tail-recursion, I'd avoid recursion. You would run the risk to a stack overflow. You could wrap zipIn.read(data) in an scala.BufferedIterator[Byte] and go from there.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文