zlib gzgets 非常慢?
我正在做与解析大量文本文件相关的事情,并且正在测试要使用的输入法。
使用c++ std::ifstreams与c FILE没有太大区别,
根据zlib的文档,它支持未压缩的文件,并且会在不解压的情况下读取文件。
我发现使用非 zlib 的时间为 12 秒,而使用 zlib.h 的时间超过 4 分钟,
我已经测试过多次运行,因此这不是磁盘缓存问题。
我是否以错误的方式使用 zlib?
谢谢
#include <zlib.h>
#include <cstdio>
#include <cstdlib>
#include <fstream>
#define LENS 1000000
size_t fg(const char *fname){
fprintf(stderr,"\t-> using fgets\n");
FILE *fp =fopen(fname,"r");
size_t nLines =0;
char *buffer = new char[LENS];
while(NULL!=fgets(buffer,LENS,fp))
nLines++;
fprintf(stderr,"%lu\n",nLines);
return nLines;
}
size_t is(const char *fname){
fprintf(stderr,"\t-> using ifstream\n");
std::ifstream is(fname,std::ios::in);
size_t nLines =0;
char *buffer = new char[LENS];
while(is. getline(buffer,LENS))
nLines++;
fprintf(stderr,"%lu\n",nLines);
return nLines;
}
size_t iz(const char *fname){
fprintf(stderr,"\t-> using zlib\n");
gzFile fp =gzopen(fname,"r");
size_t nLines =0;
char *buffer = new char[LENS];
while(0!=gzgets(fp,buffer,LENS))
nLines++;
fprintf(stderr,"%lu\n",nLines);
return nLines;
}
int main(int argc,char**argv){
if(atoi(argv[2])==0)
fg(argv[1]);
if(atoi(argv[2])==1)
is(argv[1]);
if(atoi(argv[2])==2)
iz(argv[1]);
}
I'm doing stuff related to parsing huge globs of textfiles, and was testing what input method to use.
There is not much of a difference using c++ std::ifstreams vs c FILE,
According to the documentation of zlib, it supports uncompressed files, and will read the file without decompression.
I'm seeing a difference from 12 seconds using non zlib to more than 4 minutes using zlib.h
This I've tested doing multiple runs, so its not a disk cache issue.
Am I using zlib in some wrong way?
thanks
#include <zlib.h>
#include <cstdio>
#include <cstdlib>
#include <fstream>
#define LENS 1000000
size_t fg(const char *fname){
fprintf(stderr,"\t-> using fgets\n");
FILE *fp =fopen(fname,"r");
size_t nLines =0;
char *buffer = new char[LENS];
while(NULL!=fgets(buffer,LENS,fp))
nLines++;
fprintf(stderr,"%lu\n",nLines);
return nLines;
}
size_t is(const char *fname){
fprintf(stderr,"\t-> using ifstream\n");
std::ifstream is(fname,std::ios::in);
size_t nLines =0;
char *buffer = new char[LENS];
while(is. getline(buffer,LENS))
nLines++;
fprintf(stderr,"%lu\n",nLines);
return nLines;
}
size_t iz(const char *fname){
fprintf(stderr,"\t-> using zlib\n");
gzFile fp =gzopen(fname,"r");
size_t nLines =0;
char *buffer = new char[LENS];
while(0!=gzgets(fp,buffer,LENS))
nLines++;
fprintf(stderr,"%lu\n",nLines);
return nLines;
}
int main(int argc,char**argv){
if(atoi(argv[2])==0)
fg(argv[1]);
if(atoi(argv[2])==1)
is(argv[1]);
if(atoi(argv[2])==2)
iz(argv[1]);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我猜你正在使用 zlib-1.2.3。在此版本中,gzgets() 实际上为每个字节调用 gzread()。这种方式调用gzread()的开销很大。您可以比较调用 gzread(gzfp, buffer, 4096) 一次和调用 gzread(gzfp, buffer, 1) 4096 次的 CPU 时间。结果是一样的,但是CPU时间却有很大的不同。
您应该做的是为 zlib 实现缓冲 I/O,通过一次 gzread() 调用读取一块约 4KB 的数据(就像 fread() 对 read() 所做的那样)。据说最新的zlib-1.2.5在gzread/gzgetc/....上有显着的改进。你也可以尝试一下。由于刚发布不久,我还没有亲自尝试过。
编辑:
我刚刚尝试过 zlib-1.2.5。 1.2.5 中的 gzgetc 和 gzgets 比 1.2.3 中的快得多。
I guess you are using zlib-1.2.3. In this version, gzgets() is virtually calling gzread() for each byte. Calling gzread() in this way has a big overhead. You can compare the CPU time of calling gzread(gzfp, buffer, 4096) once and of calling gzread(gzfp, buffer, 1) for 4096 times. The result is the same, but the CPU time is hugely different.
What you should do is to implement buffered I/O for zlib, reading ~4KB data in a chunk with one gzread() call (like what fread() does for read()). The latest zlib-1.2.5 is said to be significantly improved on gzread/gzgetc/.... You may try that as well. As it is released very recently, I have not tried personally.
EDIT:
I have tried zlib-1.2.5 just now. gzgetc and gzgets in 1.2.5 are much faster than those in 1.2.3.