由于字符编码而损坏的 Gzip 字符串

发布于 2024-11-05 10:12:53 字数 1076 浏览 0 评论 0原文

我正在尝试恢复一些损坏的 Gzip 日志文件。文件通过 Java 支持的网页传输到我们的服务器。文件始终以纯文本形式发送,但我们最近开始接收 Gzip 压缩的日志文件。这些 Gzip 压缩文件似乎已损坏,并且无法解压缩,并且原始文件已被删除。我相信这是来自下面方法中的字符编码。

有什么方法可以恢复将文件恢复为其原始压缩格式的过程吗?我在数据库 blob 中得到了生成的字符串二进制数组数据。

感谢您提供的任何帮助!

 private String convertStreamToString(InputStream is) throws IOException {
    /*
     * To convert the InputStream to String we use the
     * Reader.read(char[] buffer) method. We iterate until the
     * Reader return -1 which means there's no more data to
     * read. We use the StringWriter class to produce the string.
     */
    if (is != null) {
        Writer writer = new StringWriter();

        char[] buffer = new char[1024];
        try {
            Reader reader = new BufferedReader(
                    new InputStreamReader(is, "UTF-8"));
            int n;
            while ((n = reader.read(buffer)) != -1) {
                writer.write(buffer, 0, n);
            }
        } finally {
            is.close();
        }
        return writer.toString();
    } else {
        return "";
    }
}

I have some corrupted Gzip log files that I'm trying to restore. The files were transfered to our servers through a Java backed web page. The files have always been sent as plain text, but we recently started to receive log files that were Gzipped. These Gzipped files appear to be corrupted, and are not unzip-able, and the originals have been deleted. I believe this is from the character encoding in the method below.

Is there any way to revert the process to restore the files to their original zipped format? I have the resulting Strings binary array data in a database blob.

Thanks for any help you can give!

 private String convertStreamToString(InputStream is) throws IOException {
    /*
     * To convert the InputStream to String we use the
     * Reader.read(char[] buffer) method. We iterate until the
     * Reader return -1 which means there's no more data to
     * read. We use the StringWriter class to produce the string.
     */
    if (is != null) {
        Writer writer = new StringWriter();

        char[] buffer = new char[1024];
        try {
            Reader reader = new BufferedReader(
                    new InputStreamReader(is, "UTF-8"));
            int n;
            while ((n = reader.read(buffer)) != -1) {
                writer.write(buffer, 0, n);
            }
        } finally {
            is.close();
        }
        return writer.toString();
    } else {
        return "";
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

聽兲甴掵 2024-11-12 10:12:53

如果这是用于将 InputStream 转换为 String 的方法,那么您的数据几乎肯定会丢失。

问题在于 UTF-8 有相当多的字节序列根本不合法(即它们不代表任何值)。这些序列将被 Unicode 替换字符替换。

无论哪个无效字节序列被解码,该字符都是相同的。因此,这些字节中的特定信息会丢失。

If this is the method that was used to convert the InputStream to a String, then your data is almost certainly lost.

The problem is that UTF-8 has quite a few byte sequences that are simply not legal (i.e. they don't represent any value). These sequences will be replaced with the Unicode replacement character.

That character is the same no matter which invalid byte sequence was decoded. Therefore the specific information in those bytes is lost.

执妄 2024-11-12 10:12:53

如果这是您拥有的代码,您永远不应该转换为 Reader (或者实际上是字符串)。仅保留为流(或字节数组)才能避免损坏二进制文件。一旦它被读入字符串......非法序列(utf-8 中有很多)将被丢弃。

所以不,除非你很幸运,否则无法恢复信息。您必须提供另一个进程来处理纯流并作为纯 BLOB 而不是 CLOB 插入

If that's the code you have you never should have converted to a Reader (or in fact a String). Only preserving as a Stream (or byte array) would avoid corrupting binary files. And once it's read into the string....illegal sequences (and there are many in utf-8) WILL be discarded.

So no, unless you are quite lucky, there is no way to recover the info. You'll have to provide another process where you process the pure stream and insert as a pure BLOB not a CLOB

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文