将 7z 文件视为 .NET 流
我想链接多个流操作(例如下载文件、即时解压缩以及在没有任何临时文件的情况下处理数据)。文件为 7z 格式。有一个 LZMA SDK 可用,但迫使我创建一个外部输出流,而不是本身就是一个流 - 换句话说,输出流必须在我可以使用它之前完全写入。 SevenZipSharp 似乎也缺少此功能。
有人做过类似的事情吗?
// in pseudo-code - CompressedFileStream derives from Stream
foreach (CompressedFileStream f in SevenZip.UncompressFiles(Web.GetStreamFromWeb(url))
{
Console.WriteLine("Processing file {0}", f.Name);
ProcessStream( f ); // further streaming, like decoding, processing, etc
}
每个文件流的行为就像代表一个文件的一次读取流,并且在主压缩流上调用 MoveNext() 会自动使 & 无效。跳过该文件。
可以进行类似的构造来进行压缩。示例用法 - 对大量数据进行一些聚合 - 对于目录中的每个 7z 文件、内部的每个文件、每个文件中的每个数据行,总结一些值。
更新 2012-01-06
#ziplib (SharpZipLib) 已经完全满足了我对带有 ZipInputStream
类的 zip 文件的需要。下面的示例将给定 zip 文件中的所有文件作为不可查找的流生成。仍在寻找 7z 解决方案。
IEnumerable<Stream> UnZipStream(Stream stream)
{
using (var zipStream = new ZipInputStream(stream))
{
ZipEntry entry;
while ((entry = zipStream.GetNextEntry()) != null)
if (entry.IsFile)
yield return zipStream;
}
}
I would like to chain multiple stream operations (like downloading a file, uncompressing it on the fly, and processing the data without any temp files). The files are in 7z format. There is a LZMA SDK available, but forces me to create an outside output stream instead of being a stream itself - in other words the output stream will have to be fully written before I can work with it. SevenZipSharp also seems to be missing this functionality.
Has anyone done something like that?
// in pseudo-code - CompressedFileStream derives from Stream
foreach (CompressedFileStream f in SevenZip.UncompressFiles(Web.GetStreamFromWeb(url))
{
Console.WriteLine("Processing file {0}", f.Name);
ProcessStream( f ); // further streaming, like decoding, processing, etc
}
Each file stream would behave like a read-once stream representing one file, and calling MoveNext() on the main compressed stream would automatically invalidate & skip that file.
Similar constructs can be done for compression. Example usage - do some aggregation on a very large quantity of data - for each 7z file in a dir, for each file inside, for each data line in each file, sum up some value.
UPDATE 2012-01-06
#ziplib (SharpZipLib) already does exactly what I need for zip files with ZipInputStream
class. Here is an example that yields all files as unseekable streams inside a given zip file. Still looking for 7z solution.
IEnumerable<Stream> UnZipStream(Stream stream)
{
using (var zipStream = new ZipInputStream(stream))
{
ZipEntry entry;
while ((entry = zipStream.GetNextEntry()) != null)
if (entry.IsFile)
yield return zipStream;
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
压缩时指定的底层算法和参数决定了所使用的块的大小,并且无法确保在解码块时它们落在字/行边界处。因此,在处理之前您必须完全解压缩文件。
如果没有临时文件,您要求做的可能是不可能的 - 它真正取决于您是否有足够的内存来通过 MemoryStream 保持解压缩的文件打开,执行所有处理,然后释放内存放回池中。使问题进一步复杂化的是(进程内存)碎片,您可能会导致重复执行此操作。
The underlying algorithm and parameters specified at the time of compression determine the size of chunks used and there is no way to ensure that as you decode chunks, they fall at word / line boundaries. So, you will have to completely decompress a file before processing.
What you are asking to do is probably not possible without temp files - what it really depends on is whether you have sufficient memory to keep the decompressed file open via a MemoryStream, perform all your processing and then release the memory back to the pool. Further complicating this is the fragmentation (of process memory) that you could cause doing this repeatedly.