为什么 PPTX 从网站下载为 ZIP 文件?
我知道从网站下载 PPTX 并将其下载为 ZIP(Office 2007 文件格式重命名为 zip)问题的根本原因,并且我知道如何在 Web 服务器中修复它(添加 MIME 类型) 。
但我有兴趣了解为什么会发生这种情况以及网络服务器和网络浏览器执行该过程的机制。我知道 HTTP 流量可以自然地压缩和解压缩 (gzip) 以提高性能,所以我猜测这也可能是问题的一部分。
例如,假设文件名和路径通过 HTTP 传回浏览器。是网络服务器重命名了扩展程序还是网络浏览器?
一个小的流程图将是理想的。
I know the root cause of the problem of downloading (say) a PPTX from a web site and it downloading as a ZIP (Office 2007 file format is renamed zip) and I know how to fix it in the web server (add MIME types).
But I'm interested in understanding why this is happening and the mechanics of the process been carried out by the web server and web browser. I'm aware that HTTP traffic can be naturally zipped and unzipped (gzip) to improve performance so I'm guessing that this could also be part of the problem.
For example, one assumes the file name and path is passed back to the browser by HTTP. Is it the web server that's renaming the extension or the web browser?
A little flow diagram would be ideal.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
很抱歉回答这个非常旧的线程,但希望这是有用的信息。
pptx(或 docx)文件重命名为 zip 的原因是 Web 服务器和浏览器的操作组合。最有可能的是,Web 服务器尚未配置为处理 pptx 文件,因此它使用 Content-Type: text/plain 发送它们。某些浏览器(例如 Chrome 和 Firefox)可能会说“好吧,我相信你”,然后只需按照您的指示保存文件即可。其他浏览器(例如 MSIE)可能会说“我只是检查一下”;他们检查文件内容,表明这是一个 ZIP 文件。因此,如果 MSIE 在某处有一个选项“下载文件时不检查 MIME 类型”,那么这就是您所需要的。
另一个解决方案在于 Web 服务器,它确实需要发送 Content-type: application/mspowerpoint 或类似内容。如果您对 Web 服务器有适当的访问权限,则只需在 .htaccess 文件中添加一行 AddType application/mspowerpoint .pptx ,这将强制服务器发送 MSIE 将发送的 Content-type 标头正确解释。
Apologies for answering this very old thread, but hopefully this is useful information.
The reason that pptx (or docx) files are renamed to zip is a combination of actions by both the web server and the browser. Most probably, the web server has not been configured to handle pptx files, so it sends them with Content-Type: text/plain. Some browsers (e.g. Chrome and Firefox) may say "ok, I believe you", and simply save the file under your instruction. Other browsers (e.g. MSIE) may say "I'll just check that"; and they check the file contents, which indicate a ZIP file. So, if MSIE has an option somewhere for "do not check MIME types when downloading files" then that is what you need.
Another solution lies with the web server, which really needs to send Content-type: application/mspowerpoint or similar. If you have suitable access to the web server you just need to add a line to the .htaccess file saying AddType application/mspowerpoint .pptx which will force the server to send a Content-type header that MSIE will correctly interpret.
1) 网络浏览器很可能使用幻数来识别文件类型,基于文件的前几个字节(通常是二进制文件的某种标头)。
如您所知,Office 2007 文件打包为 zip,因此浏览器(当没有任何 MIME 信息提供帮助时)开始下载文件,看到 zip 标头,然后保存它(或提示您将其保存)为 zip 文件。
对我来说,这对于浏览器来说似乎是奇怪的行为,我希望它保留服务器提供的文件名(和扩展名),但这可能会因浏览器之间以及提供(或不提供)的 MIME 类型而有所不同。
2) 或者,当服务器没有与特定文件扩展名关联的 MIME 类型时,服务器可能会执行相同的操作。它可能会检查文件的开头并发现它看起来像一个 zip 文件,因此会使用 zip MIME 类型将该文件返回给客户端。
您可以通过检查 HTTP 响应或原始数据包(服务器端或客户端)来排除服务器进行任何 MIME 类型猜测,例如 Wireshark。
3) Gzipping 不会成为问题,它发生在较低级别并且与 MIME 类型无关。
1) Its probable the web browser is using magic numbers to identify the type of file, based on the first few bytes of the file (typically a header of some sort for binary files).
As you are aware, Office 2007 files are packaged as zip, and so the browser (when it doesn't have any MIME information to help), starts downloading the file, sees the zip header, and so saves it (or prompts you to save it) as a zip file.
This to me seems like strange behaviour for the browser, I would have expected it to keep the file name (and extension) as provided by the server, but that may vary between browsers and on exactly what MIME type is provided (or not provided).
2) Alternatively, the server may be doing the same thing, when it doesn't have a MIME type associated with a particular file extension. It might check the start of the file and find that it looks like a zip file, so will serve the file back to the client with a zip MIME type.
You could rule out the server doing any MIME type guessing by inspecting the HTTP response or raw packets (either server or client side) with something like Wireshark.
3) Gzipping won't be the problem, that happens on a lower level and is unrelated to MIME types.
我发现的最好的解释 - 关于为什么会发生这种情况以及如何解决它 - 是 http://blogs.msdn.com/b/asiatech/archive/2012/ 03/28/office-documents-will-be-recognized-as-zip-file-when-downloading-from-ie.aspx。
The best explanation I've found -- both as to why this happens and how to fix it -- is http://blogs.msdn.com/b/asiatech/archive/2012/03/28/office-documents-will-be-recognized-as-zip-file-when-downloading-from-ie.aspx.