哪些文件类型值得压缩(zipping)以进行远程存储?对于它们中的哪一个,压缩尺寸/原始尺寸比率<<0。 1?
我将文档存储在sql server中的varbinary(max)字段中,当用户有以下情况时,我可以选择使用文件流:
(DB_Size + Docs_Size) ~> 0.8 * ExpressEdition_Max_DB_Size
我当前正在压缩所有文件,无论如何,这样做是因为文档读/写工作是在10年前开发的,其中存储是比现在还贵。
许多文件在压缩后几乎与原始文件一样大(压缩后的 pdf 大约是原始大小的 95%)。无论如何,解压缩都会产生一些开销,当我还需要“签入”/更新文件时,开销会增加两倍,因为我需要压缩它。
因此,我正在考虑通过提供一些有意义的默认值来为用户提供选择是否压缩文件类型的选项。根据我的经验,我会施加以下规则:
1) 默认情况下压缩:txt、bmp、rtf
2) 默认情况下不压缩:jpg、jpeg、Microsoft Office 文件、Open Office 文件、png、tif、tiff
您能否建议其他文件选择最常见的文件类型或对我在此处列出的文件类型进行评论?
I am storing documents in sql server in varbinary(max) fileds, I use filestream optionally when a user has:
(DB_Size + Docs_Size) ~> 0.8 * ExpressEdition_Max_DB_Size
I am currently zipping all the files, anyway this is done because the Document Read/Write work was developed 10 years ago where Storage was more expensive than now.
Many files when zipped are almost as big as the original (a zipped pdf is about 95% of original size). And anyway unzipping has some overhead, that becomes twice when I need also to "Check-in"/Update the file because I need to zip it.
So I was thinking of giving to the users the option to choose whether the file type will be zipped or not by providing some meaningful default values. For my experience I would impose the following rules:
1) zip by default: txt, bmp, rtf
2) do not zip by default: jpg, jpeg, Microsoft Office files, Open Office files, png, tif, tiff
Could you suggest other file types chosen among the most common or comment on the ones I listed here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果我没记错的话,.doc 和 .mdb 文件实际上往往压缩得相当好。不过,Office 2007 的等效文件(.docx 和 .accdb)已经是 zip 文件了……所以压缩它们几乎没有用处。
不要忘记 HTML 和 XML 文件。默认压缩。
.doc and .mdb files actually tend to compress rather well, if i remember correctly. The Office 2007 equivalents (.docx and .accdb), though, are zip files already...so compressing them is pretty much useless.
Don't forget HTML and XML files. Zip by default.
我赞扬您能够识别什么是压缩文件类型和什么不是压缩文件类型。您可能已经明白这一点,但我会在这里咆哮:
不要双重压缩方法!每种压缩方法都会添加自己的标头,从而增加文件大小,并且由于数据已经通过一种方法尽可能地消除了统计冗余,它可能无法通过另一种方法进一步压缩。以这组文件为例:
所有这些文件都包含相同的数据。
第一种压缩方法可以很好地消除冗余,但随后的每一种压缩方法都只会增加文件大小,更不用说稍后解密文件的麻烦了。
最好的压缩方法通常是第一个应用的方法。
I commend you on being able to recognize what are and aren't compressed file types. You probably already understand this, but I'll rant here:
Do not double-up compression methods! Each compression method adds its own header adding to file size, and since the data has already had its statistical redundancies eliminated as best as it could by one method, it's probably not going to be able to compressed further via another method. Take this set of files for example:
All of these files contain the same data.
The first compression method worked well to eliminate redundancies, but each successive compression method just added to the file size, not to mention the headache of decrypting the file later.
The best method of compression is usually the first one applied.