gzcompress() 随机插入额外数据?
我整个早上都在研究这个问题,并决定作为最后的努力,也许 Stack Overflow 上的某个人可以为我提供“去过那里,做过的”类型的答案。
背景
Recently, I implemented compression on our (intranet-oriented) Apache (2.2) server using filters so that all text-based files are compressed (css, js, txt, html, etc.) via mod_deflate, mentioning nothing about php scripts. After plenty of research on how best to compress PHP output, I decided to use the gzcompress() flavor because the PHP documentation suggests that using zlib library and gzip (using the deflate algorithm, blah blah blah) is preferred over ob_gzipwhatever().所以我复制了别人的方法,如下所示:
<?php # start each page by enabling output buffering and disabling automatic flushes
ob_start();ob_implicit_flush(0);
(program logic)
print_gzipped_page();
function print_gzipped_page() {
if (headers_sent())
$encoding = false;
elseif(strpos($_SERVER['HTTP_ACCEPT_ENCODING'],'x-gzip') !== false )
$encoding = 'x-gzip';
elseif(strpos($_SERVER['HTTP_ACCEPT_ENCODING'],'gzip') !== false )
$encoding = 'gzip';
else
$encoding = false;
if($encoding){
$contents = ob_get_contents(); # get contents of buffer
ob_end_clean(); # turn off OB and flush buffer
$size = strlen($contents);
if ($size < 512) { # too small to be worth a compression
echo $contents;
exit();
} else {
header("Content-Encoding: $encoding");
header('Vary: Accept-Encoding');
# 8-byte file header: g-zip file (1f 8b) compression type deflate (08), next 5 bytes are padding
echo "\x1f\x8b\x08\x00\x00\x00\x00\x00";
$contents = gzcompress($contents, 9);
$contents = substr($contents, 0,$size); # faster than not using a substr, oddly
echo $contents;
exit();
}
} else {
ob_end_flush();
exit();
}
}
非常标准的东西,对吧?
问题
Between 10% and 33% of all our PHP page requests sent via Firefox go out fine and come back g-zipped, only Firefox displays the compressed ASCII in lieu of decompressing it. AND, the weirdest part, is that the content size sent back is always 30 or 31 bytes larger than the size of the page correctly rendered. As in, when the script is displayed properly, Firebug shows content size of 1044; when Firefox shows a huge screen of binary gibberish, Firebug shows a content size of 1074.这种情况发生在我们的一些使用旧版 32 位 Fedora 12s 运行 Firefox 3.3s 的用户身上...然后它发生在一位使用 FF5 的用户身上,一位使用 FF6 的用户身上,还有一些使用新的 7.1 的用户身上!无论如何,我一直打算将它们全部升级到 FF7.1,所以我一直在更新它们,因为它们有问题,但 FF7.1 仍然表现出相同的行为,只是频率较低。
诊断
I've been installing Firebug on a variety of computers to watch the headers, and that's where I'm getting confused: Normal, functioning page response headers:- HTTP/1.1 200 OK
- 日期:2011 年 10 月 21 日星期五 18:40:15 GMT
- 服务器:Apache/2.2.15 (Fedora)
- X-Powered-By:PHP/5.3.2
- 过期:1981 年 11 月 19 日星期四 08:52 :00 GMT
- 缓存控制:无存储、无缓存、必须重新验证、后检查=0、 pre-check=0
- Pragma:无缓存
- 内容编码:gzip
- 变化:接受编码
- 内容长度:1045
- 保持活动:超时=10,最大值=75
- 连接:保持活动
- 内容类型:text/html; charset=UTF-8
(请注意,内容长度是自动生成的)
损坏时相同页面:
- HTTP/1.1 200 OK
- (其他所有内容相同)
- 内容长度:1075
发送的标头始终包含 Accept-Encoding :gzip,deflate
我尝试修复以下行为:
- 显式声明未压缩和压缩长度的内容长度
- 不使用 $contents 的 substr()
- 末尾的校验和
删除 $contents I 我真的不想使用 gzencode,因为我的测试显示它比 gzcompress 慢得多(9%),大概是因为它生成额外的校验和以及我(假设)网络浏览器不需要或使用的东西。
我无法在运行 Firefox 7.1 的 64 位 Fedora 14 机器上复制该行为。 在实时滚动压缩代码之前的测试中,无论是 Chrome 还是 Firefox,都没有发生过这种情况。 (编辑:发布此文后,我打开的一个每 30 秒发送一次元刷新的窗口在 Firefox 中大约 60 次刷新后最终崩溃了)我们少数的 Windows XP 机器的行为与 Fedora 12 相同。搜索 Firefox 的 Bugzilla 时发现了一两个与这种情况有些相似的 bug 请求,但那是针对 3.3 之前的版本,并且包含所有 gzip 内容,而我们的 Apache gzip css 和 js 文件正在下载并显示,没有错误每次。
事实上,内容长度每次都会变大 30/31 字节,这让我觉得我的 script/gzcompress() 内部有什么东西被破坏了,导致 Firefox 的响应中出现了一些问题。当然,如果您尝试更改 echo'd gzip 标头,Firefox 会抛出“内容编码错误”,因此我确实倾向于 gzcompress() 内部的问题。
我注定了吗?我是否必须废弃此实现并使用不首选的 ob_start("ob_gzhandler") 方法?
我想我的“适用于多种情况”的问题是:PHP 中的 zlib 压缩库中是否存在已知错误,在接收非常具体的输入时会执行一些奇怪的操作?
编辑:坚果。我用 gzfile() 读取了 Firefox 下载的一个损坏的、未压缩的页面,你瞧,它完美地回显了所有内容。 =( 这意味着这一定是...不,我什么都没有。
I've been researching this all morning and have decided that as a last-ditch effort, maybe someone on Stack Overflow has a "been-there, done-that" type of answer for me.
Background
Recently, I implemented compression on our (intranet-oriented) Apache (2.2) server using filters so that all text-based files are compressed (css, js, txt, html, etc.) via mod_deflate, mentioning nothing about php scripts. After plenty of research on how best to compress PHP output, I decided to use the gzcompress() flavor because the PHP documentation suggests that using zlib library and gzip (using the deflate algorithm, blah blah blah) is preferred over ob_gzipwhatever().
So I copied someone else's method like so:
<?php # start each page by enabling output buffering and disabling automatic flushes
ob_start();ob_implicit_flush(0);
(program logic)
print_gzipped_page();
function print_gzipped_page() {
if (headers_sent())
$encoding = false;
elseif(strpos($_SERVER['HTTP_ACCEPT_ENCODING'],'x-gzip') !== false )
$encoding = 'x-gzip';
elseif(strpos($_SERVER['HTTP_ACCEPT_ENCODING'],'gzip') !== false )
$encoding = 'gzip';
else
$encoding = false;
if($encoding){
$contents = ob_get_contents(); # get contents of buffer
ob_end_clean(); # turn off OB and flush buffer
$size = strlen($contents);
if ($size < 512) { # too small to be worth a compression
echo $contents;
exit();
} else {
header("Content-Encoding: $encoding");
header('Vary: Accept-Encoding');
# 8-byte file header: g-zip file (1f 8b) compression type deflate (08), next 5 bytes are padding
echo "\x1f\x8b\x08\x00\x00\x00\x00\x00";
$contents = gzcompress($contents, 9);
$contents = substr($contents, 0,$size); # faster than not using a substr, oddly
echo $contents;
exit();
}
} else {
ob_end_flush();
exit();
}
}
Pretty standard stuff, right?
Problem
Between 10% and 33% of all our PHP page requests sent via Firefox go out fine and come back g-zipped, only Firefox displays the compressed ASCII in lieu of decompressing it. AND, the weirdest part, is that the content size sent back is always 30 or 31 bytes larger than the size of the page correctly rendered. As in, when the script is displayed properly, Firebug shows content size of 1044; when Firefox shows a huge screen of binary gibberish, Firebug shows a content size of 1074.
This happened to some of our users on legacy 32-bit Fedora 12s running Firefox 3.3s... Then it happened to a user with FF5, one with FF6, and some with the new 7.1! I've been meaning to upgrade them all to FF7.1, anyway, so I've been updating them as they have issues, but FF7.1 is still exhibiting the same behavior, just less frequently.
Diagnostics
I've been installing Firebug on a variety of computers to watch the headers, and that's where I'm getting confused:
Normal, functioning page response headers:
- HTTP/1.1 200 OK
- Date: Fri, 21 Oct 2011 18:40:15 GMT
- Server: Apache/2.2.15 (Fedora)
- X-Powered-By: PHP/5.3.2
- Expires: Thu, 19 Nov 1981 08:52:00 GMT
- Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
- Pragma: no-cache
- Content-Encoding: gzip
- Vary: Accept-Encoding
- Content-Length: 1045
- Keep-Alive: timeout=10, max=75
- Connection: Keep-Alive
- Content-Type: text/html; charset=UTF-8
(Notice that content-length is generated automatically)
Same page when broken:
- HTTP/1.1 200 OK
- (everything else identical)
- Content-Length: 1075
The sent headers always include Accept-Encoding: gzip, deflate
Things I've tried to fix the behavior:
- Explicitly declare content length with uncompressed and compressed lengths
- Not use the substr() of $contents
- Remove checksum at the end of $contents
I don't really want to use gzencode because my testing showed it to be significantly slower (9%) than gzcompress, presumably because it's generating extra checksums and whatnot that I (assumed) the web browsers don't need or use.
I cannot duplicate the behavior on my 64-bit Fedora 14 box running Firefox 7.1. Not once in my testing before rolling the compression code live did this happen to me, neither in Chrome nor Firefox. (Edit: Immediately after posting this, one of the windows I'd left open that sends meta refreshes every 30 seconds finally broke after ~60 refreshes in Firefox) Our handful of Windows XP boxes are behaving the same as the Fedora 12s. Searching through Firefox's Bugzilla kicked up one or two bug requests that were somewhat similar to this situation, but that was for versions pre-dating 3.3 and was with all gzipped content, whereas our Apache gzipped css and js files are being downloaded and displayed without error each time.
The fact that the content-length is coming back 30/31 bytes larger each time leads me to think that something is breaking inside my script/gzcompress() that is mangling something in the response that Firefox chokes on. Naturally, if you play with altering the echo'd gzip header, Firefox throws a "Content Encoding Error," so I'm really leaning towards the problem being internal to gzcompress().
Am I doomed? Do I have to scrap this implementation and use the not-preferred ob_start("ob_gzhandler") method?
I guess my "applies to more than one situation" question would be: Are there known bugs in the zlib compression library in PHP that does something funky when receiving very specific input?
Edit: Nuts. I readgzfile()'d one of the broken, non-compressed pages that Firefox downloaded and, lo and behold!, it echo'd everything back perfectly. =( That means this must be... Nope, I've got nothing.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
好的,首先,您似乎没有设置内容长度标头,这会导致问题,相反,您正在使 gzip 内容更长,以便它与您首先收到的内容长度大小相匹配。这会变得丑陋。我的建议是您将这些行替换
为
,看看它是否对这种情况有帮助。
okay 1st of all you don't seem to be setting the content length header, which will cause issues, instead, you are making the gzip content longer so that it matches the content length size that you were receiving in the 1st place. This is going to turn ugly. My suggestion is that you replace the lines
with
and see if it helps the situation.
叮!叮!叮!整个周末都在思考这个问题,在无数次重新阅读 PHP 手册页后,我终于偶然发现了答案......来自 zlib PHP 文档,“是否透明地压缩页面。”透明!如上所述,一旦 zlib.output_compression 设置为“On”,就无需其他任何操作即可让 PHP 压缩其输出。是啊,尴尬。
由于未知的原因,从 PHP 脚本显式调用的代码正在压缩已经压缩的内容,而浏览器只是解开一层压缩并显示结果。奇怪的是,当output_compression打开或关闭时,内容的strlen()没有变化,因此透明压缩必须在显式压缩之后发生,但它偶尔会决定不压缩已经压缩的内容?
无论如何,只要让 PHP 自行其是,一切就都可以解决。 zlib 不需要输出缓冲或任何东西来压缩输出。
希望这可以帮助其他在 HTTP 压缩的美妙世界中苦苦挣扎的人。
Ding! Ding! Ding! After mulling over this problem all weekend, I finally stumbled across the answer after re-reading the PHP man pages for the umpteenth time... From the zlib PHP documentation, "Whether to transparently compress pages." Transparently! As in, nothing else is required to get PHP to compress its output once zlib.output_compression is set to "On". Yeah, embarrassing.
For reasons unknown, the code being called, explicitly, from the PHP script was compressing the already-compressed contents and the browser was simply unwrapping the one layer of compression and displaying the results. Curiously, the strlen() of the content didn't vary when output_compression was on or off, so the transparent compression must occur after the explicit compression, but it occasionally decided not to compress what was already compressed?
Regardless, everything is resolved by simply leaving PHP to its own devices. zlib doesn't need output buffering or anything to compress the output.
Hope this helps others struggling with the wonderful world of HTTP compression.