在 Java 中解压缩 GZIPed HTTP 响应

发布于 2024-08-25 07:31:01 字数 1580 浏览 6 评论 0原文

我正在尝试使用 GZIPInputStream 解压缩 GZIPed HTTP 响应。但是，当我尝试读取流时，我总是遇到相同的异常：java.util.zip.ZipException：无效的位长度重复

我的HTTP请求标头：

GET www.myurl.com HTTP/1.0\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 115\r\n
Connection: keep-alive\r\n
X-Requested-With: XMLHttpRequest\r\n
Cookie: Some Cookies\r\n\r\n

在HTTP响应标头的末尾，我得到path=/Content-Encoding: gzip，后跟 gzip 响应。

我尝试了 2 个类似的代码来解压缩：

更新：在以下代码中， tBytes = ('path=/Content-Encoding: gzip' 之后的字符串).getBytes ();

GZIPInputStream  gzip = new GZIPInputStream (new ByteArrayInputStream (tBytes));

StringBuffer  szBuffer = new StringBuffer ();

byte  tByte [] = new byte [1024];

while (true)
{
    int  iLength = gzip.read (tByte, 0, 1024); // <-- Error comes here

    if (iLength < 0)
        break;

    szBuffer.append (new String (tByte, 0, iLength));
}

我得到的这个在这个论坛上：

InputStream     gzipStream = new GZIPInputStream   (new ByteArrayInputStream (tBytes));
Reader          decoder    = new InputStreamReader (gzipStream, "UTF-8");//<- I tried ISO-8859-1 and get the same exception
BufferedReader  buffered   = new BufferedReader    (decoder);

我想这是一个编码错误。

最好的问候，

bill0ute

原文

I'm trying to uncompress a GZIPed HTTP Response by using GZIPInputStream. However I always have the same exception when I try to read the stream : java.util.zip.ZipException: invalid bit length repeat

My HTTP Request Header:

GET www.myurl.com HTTP/1.0\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 115\r\n
Connection: keep-alive\r\n
X-Requested-With: XMLHttpRequest\r\n
Cookie: Some Cookies\r\n\r\n

At the end of the HTTP Response header, I get path=/Content-Encoding: gzip, followed by the gziped response.

I tried 2 similars codes to uncompress :

UPDATE : In the following codes, tBytes = (the string after 'path=/Content-Encoding: gzip').getBytes ();

GZIPInputStream  gzip = new GZIPInputStream (new ByteArrayInputStream (tBytes));

StringBuffer  szBuffer = new StringBuffer ();

byte  tByte [] = new byte [1024];

while (true)
{
    int  iLength = gzip.read (tByte, 0, 1024); // <-- Error comes here

    if (iLength < 0)
        break;

    szBuffer.append (new String (tByte, 0, iLength));
}

And this one that I get on this forum :

InputStream     gzipStream = new GZIPInputStream   (new ByteArrayInputStream (tBytes));
Reader          decoder    = new InputStreamReader (gzipStream, "UTF-8");//<- I tried ISO-8859-1 and get the same exception
BufferedReader  buffered   = new BufferedReader    (decoder);

I guess this is an encoding error.

Best regards,

bill0ute

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蘸点软妹酱 2024-09-01 07:31:01

您不会在此处展示如何获取用于设置 gzip 流的 tBytes：

GZIPInputStream  gzip = new GZIPInputStream (new ByteArrayInputStream (tBytes));

一种解释是您将整个 HTTP 响应包含在 tBytes 中。相反，它应该只是 HTTP 标头之后的内容。

另一种解释是响应被分块。

编辑：您将内容编码行之后的数据作为邮件正文。然而，根据 HTTP 1.1 规范，标头字段不按任何特定顺序出现，因此这是非常危险的。

正如 HTTP 规范的这一部分所述，请求或响应不是出现在特定标头字段之后，而是出现在第一个空行之后：

请求（第 5 节）和响应
（第 6 节）消息使用通用
RFC 822 [9] 的消息格式
传输实体（的有效负载
消息）。两种类型的消息
由零个或多个起始行组成
标头字段（也称为
“标题”），一个空行（即
CRLF 之前没有任何内容的行）
表示标题的结束
字段，可能还有消息正文。

您仍然没有显示您如何准确地组成 tBytes，但此时我认为您错误地在尝试解压缩的数据中包含空行。消息正文在空行的 CRLF 字符之后开始。

我可以建议您使用 httpclient 库来提取消息正文吗？

You don't show how you get the tBytes that you use to set up the gzip stream here:

GZIPInputStream  gzip = new GZIPInputStream (new ByteArrayInputStream (tBytes));

One explanation is that you are including the entire HTTP response in tBytes. Instead, it should be only the content after the HTTP headers.

Another explanation is that the response is chunked.

edit: You are taking the data after the content-encoding line as the message body. However, according to the HTTP 1.1 specification the header fields do not come in any particular order, so this is very dangerous.

As explained in this part of the HTTP specification, the message body of a request or response doesn't come after a particular header field but after the first empty line:

Request (section 5) and Response
(section 6) messages use the generic
message format of RFC 822 [9] for
transferring entities (the payload of
the message). Both types of message
consist of a start-line, zero or more
header fields (also known as
"headers"), an empty line (i.e., a
line with nothing preceding the CRLF)
indicating the end of the header
fields, and possibly a message-body.

You still haven't show how exactly you compose tBytes, but at this point I think you're erroneously including the empty line in the data that you try to decompress. The message body starts after the CRLF characters of the empty line.

May I suggest that you use the httpclient library instead to extract the message body?

回复收藏 0 原文

躲猫猫 2024-09-01 07:31:01

好吧，我在这里看到了一个问题；

int  iLength = gzip.read (tByte, 0, 1024);

使用以下方法来解决这个问题；

        byte[] buff = new byte[1024];
byte[] emptyBuff = new byte[1024];
                            StringBuffer unGzipRes = new StringBuffer();

                            int byteCount = 0;
                            while ((byteCount = gzip.read(buff, 0, 1024)) > 0) {
                                // only append the buff elements that
                                // contains data
                                unGzipRes.append(new String(Arrays.copyOf(
                                        buff, byteCount), "utf-8"));

                                // empty the buff for re-usability and
                                // prevent dirty data attached at the
                                // end of the buff
                                System.arraycopy(emptyBuff, 0, buff, 0,
                                        1024);
                            }

Well there is the problem I can see here;

int  iLength = gzip.read (tByte, 0, 1024);

Use following to fix that;

        byte[] buff = new byte[1024];
byte[] emptyBuff = new byte[1024];
                            StringBuffer unGzipRes = new StringBuffer();

                            int byteCount = 0;
                            while ((byteCount = gzip.read(buff, 0, 1024)) > 0) {
                                // only append the buff elements that
                                // contains data
                                unGzipRes.append(new String(Arrays.copyOf(
                                        buff, byteCount), "utf-8"));

                                // empty the buff for re-usability and
                                // prevent dirty data attached at the
                                // end of the buff
                                System.arraycopy(emptyBuff, 0, buff, 0,
                                        1024);
                            }

回复收藏 0 原文

~没有更多了~