当前位置：文江博客话题详情

在Python中，如何解码GZIP编码？

发布于 2024-08-30 03:17:28 字数 125 浏览 3 评论 0 原文

我在 python 脚本中下载了一个网页。在大多数情况下，这工作得很好。

然而，这个有一个响应头：GZIP 编码，当我尝试打印这个网页的源代码时，它在我的腻子中包含了所有符号。

如何将其解码为常规文本？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谎言 2024-09-06 03:17:28

我使用 zlib 从网络上解压缩 gzip 内容。

import zlib
import urllib

f=urllib.request.urlopen(url) 
decompressed_data=zlib.decompress(f.read(), 16+zlib.MAX_WBITS)

I use zlib to decompress gzipped content from web.

import zlib
import urllib

f=urllib.request.urlopen(url) 
decompressed_data=zlib.decompress(f.read(), 16+zlib.MAX_WBITS)

回复收藏 0 原文

半衬遮猫 2024-09-06 03:17:28

使用内置 gzip 模块解压缩字节流。

如果您有任何问题，请显示您使用的确切的最小代码、确切的错误消息和回溯，以及 print repr(your_byte_stream[:100])

更多信息< /strong>

1. 有关 gzip/zlib/deflate 混淆的解释，请阅读这篇维基百科文章。

2. 如果您有字符串而不是文件，则使用 zlib 模块比使用 gzip 模块更容易。不幸的是，Python 文档不完整/错误：

zlib.decompress(string[, wbits[, bufsize]])

...wbits 的绝对值是压缩数据时使用的历史缓冲区大小（“窗口大小”）的以 2 为底的对数。对于最新版本的 zlib 库，其绝对值应在 8 到 15 之间，值越大，压缩效果越好，但内存使用量越大。默认值为15。当wbits为负数时，标准gzip头被抑制；这是 zlib 库的一个未记录的功能，用于与 unzip 的压缩文件格式兼容。

首先，8 <= log2_window_size <= 15，其含义如上。然后，应该是一个单独的 arg 的内容被混在一起：

arg == log2_window_size 意味着假设字符串采用 zlib 格式（RFC 1950；HTTP 1.1 RFC 2616 混淆地称为“deflate”）。

arg == -log2_window_size 表示假设字符串采用 deflate 格式（RFC 1951；没有仔细阅读 HTTP 1.1 RFC 的人实际实现的）

arg == 16 + log_2_window_size 表示假设字符串采用 gzip 格式（RFC 1952）。所以你可以使用31。

以上信息记录在zlib C库手册中... Ctrl -F 搜索windowBits。

回复收藏 0 原文

农村范ル 2024-09-06 03:17:28

对于 Python 3

试试这个：

import gzip

fetch = opener.open(request) # basically get a response object
data = gzip.decompress(fetch.read())
data = str(data,'utf-8')

For Python 3

Try out this:

import gzip

fetch = opener.open(request) # basically get a response object
data = gzip.decompress(fetch.read())
data = str(data,'utf-8')

回复收藏 0 原文

你的笑 2024-09-06 03:17:28

我用类似的东西：

f = urllib2.urlopen(request)
data = f.read()
try:
    from cStringIO import StringIO
    from gzip import GzipFile
    data2 = GzipFile('', 'r', 0, StringIO(data)).read()
    data = data2
except:
    #print "decompress error %s" % err
    pass
return data

I use something like that:

f = urllib2.urlopen(request)
data = f.read()
try:
    from cStringIO import StringIO
    from gzip import GzipFile
    data2 = GzipFile('', 'r', 0, StringIO(data)).read()
    data = data2
except:
    #print "decompress error %s" % err
    pass
return data

回复收藏 0 原文

夜深人未静 2024-09-06 03:17:28

如果您使用 Requests 模块，则您不需要使用任何其他模块，因为 gzip 和 deflate 传输编码 自动为您解码。

示例：

>>> import requests
>>> custom_header = {'Accept-Encoding': 'gzip'}
>>> response = requests.get('https://api.github.com/events', headers=custom_header)
>>> response.headers
{'Content-Encoding': 'gzip',...}
>>> response.text
'[{"id":"9134429130","type":"IssuesEvent","actor":{"id":3287933,...

响应的.text属性用于读取text上下文中的内容。

响应的.content属性用于读取二进制上下文中的内容。

请参阅二进制响应内容部分< a href="http://docs.python-requests.org" rel="noreferrer">docs.python-requests.org

If you use the Requests module, then you don't need to use any other modules because the gzip and deflate transfer-encodings are automatically decoded for you.

Example:

>>> import requests
>>> custom_header = {'Accept-Encoding': 'gzip'}
>>> response = requests.get('https://api.github.com/events', headers=custom_header)
>>> response.headers
{'Content-Encoding': 'gzip',...}
>>> response.text
'[{"id":"9134429130","type":"IssuesEvent","actor":{"id":3287933,...

The .text property of the response is for reading the content in the text context.

The .content property of the response is for reading the content in the binary context.

See the Binary Response Content section on docs.python-requests.org

回复收藏 0 原文

优雅的叶子 2024-09-06 03:17:28

这些答案都没有使用 Python 3 开箱即用。以下是我获取页面并解码 gzip 响应的方法：

import requests
import gzip

response = requests.get('your-url-here')
data = str(gzip.decompress(response.content), 'utf-8')
print(data)  # decoded contents of page

None of these answers worked out of the box using Python 3. Here is what worked for me to fetch a page and decode the gzipped response:

import requests
import gzip

response = requests.get('your-url-here')
data = str(gzip.decompress(response.content), 'utf-8')
print(data)  # decoded contents of page

回复收藏 0 原文

守不住的情 2024-09-06 03:17:28

与 Shatu 对 python3 的回答类似，但安排略有不同：

import gzip

s = Request("https://someplace.com", None, headers)
r = urlopen(s, None, 180).read()
try: r = gzip.decompress(r)
except OSError: pass
result = json_load(r.decode())

此方法允许将 gzip.decompress() 包装在 try/ except 中，以捕获并传递 OSError，从而导致您可能会获得混合压缩和未压缩数据的情况。一些小字符串如果经过编码实际上会变得更大，因此会发送纯数据。

Similar to Shatu's answer for python3, but arranged a little differently:

import gzip

s = Request("https://someplace.com", None, headers)
r = urlopen(s, None, 180).read()
try: r = gzip.decompress(r)
except OSError: pass
result = json_load(r.decode())

This method allows for wrapping the gzip.decompress() in a try/except to capture and pass the OSError that results in situations where you may get mixed compressed and uncompressed data. Some small strings actually get bigger if they are encoded, so the plain data is sent instead.

回复收藏 0 原文

旧时光的容颜 2024-09-06 03:17:28

此版本很简单，通过不调用 read() 方法来避免首先读取整个文件。它提供了一个类似文件流的对象，其行为就像普通的文件流一样。

import gzip
from urllib.request import urlopen

my_gzip_url = 'http://my_url.gz'
my_gzip_stream = urlopen(my_gzip_url)
my_stream = gzip.open(my_gzip_stream, 'r')

This version is simple and avoids reading the whole file first by not calling the read() method. It provides a file stream like object instead that behaves just like a normal file stream.

import gzip
from urllib.request import urlopen

my_gzip_url = 'http://my_url.gz'
my_gzip_stream = urlopen(my_gzip_url)
my_stream = gzip.open(my_gzip_stream, 'r')

回复收藏 0 原文

玩世 2024-09-06 03:17:28

您可以使用 urllib3 轻松解码 gzip。

urllib3.response.decode_gzip(response.data)

You can use urllib3 to easily decode gzip.

urllib3.response.decode_gzip(response.data)

回复收藏 0 原文

~没有更多了~

关于作者

溺深海

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

在Python中，如何解码GZIP编码？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（9）

关于作者

相关话题

热门标签

推荐作者

qq_E2Iff7

Archangel

freedog

Hunk

18819270189

wenkai

友情链接

在Python中，如何解码GZIP编码？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（9）

关于作者

相关话题

热门标签

推荐作者

qq_E2Iff7

Archangel

freedog

Hunk

18819270189

wenkai

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。