在 Scala 中处理 BZIP 字符串/文件

发布于 2024-10-19 13:17:07 字数 1503 浏览 4 评论 0原文

我通过在 Scala 中完成 Python 挑战系列来惩罚自己。

现在，挑战之一是读取使用 bzip 算法压缩的字符串并输出结果。

BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084

现在，经过一番挖掘，似乎没有用于 bzip 处理的标准 java 库，但 apache ant 项目中有一些东西，这个人好心地拿出来用作单独的库。

问题是，我似乎无法让它与下面的代码一起工作，它只是挂在 scala REPL 中，并且 JVM 的 CPU 使用率达到 100%

这是我正在尝试的代码...

import java.io.{ByteArrayInputStream}
import org.apache.tools.bzip2.{CBZip2InputStream}
import org.apache.commons.io.{IOUtils}
object ChallengeEight extends Application {
    val inputString = """BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084"""
    val inputStream = new ByteArrayInputStream( inputString.getBytes("UTF-8") ) //convert string to inputstream
    inputStream.skip(2) //skip the 'BZ' part at the start
    val bzipInputStream = new CBZip2InputStream(inputStream)  //hangs here....
    val result = IOUtils.toString(bzipInputStream, "UTF-8");
    println(result)
}

任何人都有想法？或者，CBZip2InputStream 类是否需要一些额外的字节，您可能会在使用 bzip2 压缩的文件中找到这些字节？

任何帮助将不胜感激

编辑根据记录，这是 python 解决方案

import bz2

un = "BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!" \
     "\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084"

print [bz2.decompress(elt) for elt in (un)]

原文

I'm punishing myself a bit by doing the python challenges series in Scala.

Now, one of the challenges is to read in a string that's been compressed using the bzip algorithm and output the result.

BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084

Now, after some digging it appears as if there isn't a standard java library for bzip processing, but there is something in the apache ant project, that this guy has kindly taken out for use as a separate library.

The thing is, I can't seem to get it to work with the following code, it just hangs in the scala REPL and the JVM maxes out at 100% CPU usage

This is the code I'm trying...

import java.io.{ByteArrayInputStream}
import org.apache.tools.bzip2.{CBZip2InputStream}
import org.apache.commons.io.{IOUtils}
object ChallengeEight extends Application {
    val inputString = """BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084"""
    val inputStream = new ByteArrayInputStream( inputString.getBytes("UTF-8") ) //convert string to inputstream
    inputStream.skip(2) //skip the 'BZ' part at the start
    val bzipInputStream = new CBZip2InputStream(inputStream)  //hangs here....
    val result = IOUtils.toString(bzipInputStream, "UTF-8");
    println(result)
}

Anyone got any ideas? Or is the CBZip2InputStream class expecting some extra bytes that you might find in a file that has been zipped with bzip2?

Any help would be appreciated

EDIT For the record this is the python solution

import bz2

un = "BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!" \
     "\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084"

print [bz2.decompress(elt) for elt in (un)]

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

微暖i 2024-10-26 13:17:07

要转义字符，请使用 unicode 转义序列类似于 \uXXXX 语法，其中 XXXX 是 unicode 字符的十六进制序列。

val un = "BZh91AY&SYA\u00af\u0082\r\u0000\u0000\u0001\u0001\u0080\u0002\u00c0\u0002\u0000 \u0000!\u009ah3M\u0007<]\u00c9\u0014\u00e1BA\u0006\u00be\u00084"

To escape characters use a unicode escape sequence like \uXXXX syntax where XXXX is the hexadecimal sequence for the unicode character.

val un = "BZh91AY&SYA\u00af\u0082\r\u0000\u0000\u0001\u0001\u0080\u0002\u00c0\u0002\u0000 \u0000!\u009ah3M\u0007<]\u00c9\u0014\u00e1BA\u0006\u00be\u00084"

回复收藏 0 原文