使用 Akka 在 Scala 中快速复制大量文件
我想知道在 Scala 中将文件 src 复制到 dest 的最佳方法是什么,该文件将被包装在 Akka Actor 中,并可能在多台机器上使用 RemoteActor。
我有大量图像文件,必须从一个目录复制到 NFS 安装目录。
没有在 Java 或 Scala 中做过太多的文件处理,但知道有 NIO 库和其他一些自 Scala 2.7 以来一直在开发的库。这是最安全、最快的事情。
我可能也应该对我的基础设施有一些了解。连接大小为 1000 MB,通过 Cisco3560 从 Isilon 节点连接到 Windows 2003 Server。 Isilon 节点是 NFS 挂载,Windows 2003 Server 是高度配置的 Samba(Cifs) 挂载。
I am wondering what would be the best way to Copy a file src to dest within Scala that will be wrapped in an Akka Actor and possibly using a RemoteActor with several machines.
I have a tremendous amount of image files I have to copy from one directory to a NFS mounted directory.
Haven't done much FileHandling in Java or Scala, but know there is the NIO lib and some others out there that have been worked on since Scala 2.7. Something that would be the safest and quickest.
I probably should give some idea of my infrastructure as well. The connection is 1000 MB's in which connects via a Cisco3560 from an Isilon node to a Windows 2003 Server. The Isilon node is the NFS mount and the Windows 2003 Server is a highly configured Samba(Cifs) mount.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可能无法超越底层操作系统的文件复制速度,因此,如果文件很大或者您可以对它们进行批处理,您可能最好使用 Scala 编写 shell 脚本,然后使用 bash 或类似命令调用它。一个线程很可能会使磁盘 IO 饱和,因此实际上没有什么花哨的事情可做。如果图像很大,您将等待磁盘上的 50MB/s 限制(或 100 Mbps 以太网上的 10MB/s 限制);如果它们很小,您将等待文件查找和网络 ping 时间等方面的数十毫秒开销。
也就是说,您可以使用 Apache Commons IO,它具有文件复制实用程序和 上一个问题在评分最高的条目中有一个高性能的答案。您可以让一个 Actor 处理所有复制任务,这应该与一群 Actor 都试图竞争相同的有限 IO 带宽一样快。
You probably can't beat the underlying OS file copy speed, so if the files are large or you can batch them, you're probably best off writing a shell script with Scala and then calling it with bash or somesuch. Chances are that one thread can saturate the disk IO, so there really isn't anything fancy to do. If the images are large, you'll be waiting for the 50ish MB/s limit on your disk (or 10ish MB/s limit on your 100 Mbps ethernet); if they're small, you'll be waiting for the quite-some-dozens of ms overhead on file seeks and network ping times and so on.
That said, you can use Apache Commons IO, which has a file copy utility, and a previous question has a high-performance answer among the top rated entries. You can have one actor handle all the copying tasks, and that should be as fast as if you have a bunch of actors all trying to compete for the same limited IO bandwidth.