mmap 比 getline 慢?
我面临着逐行读取/写入文件(在演出中)的挑战。
阅读许多论坛条目和站点(包括一堆 SO),mmap 被建议作为读取/写入文件的最快选项。但是,当我使用 readline 和 mmap 技术实现代码时,mmap 是两者中较慢的一个。对于阅读和写作来说都是如此。我一直在使用约 600 MB 大的文件进行测试。
我的实现逐行解析,然后标记该行。我将仅介绍文件输入。
这是 getline 实现:
void two(char* path) {
std::ios::sync_with_stdio(false);
ifstream pFile(path);
string mystring;
if (pFile.is_open()) {
while (getline(pFile,mystring)) {
// c style tokenizing
}
}
else perror("error opening file");
pFile.close();
}
这是 mmap:
void four(char* path) {
int fd;
char *map;
char *FILEPATH = path;
unsigned long FILESIZE;
// find file size
FILE* fp = fopen(FILEPATH, "r");
fseek(fp, 0, SEEK_END);
FILESIZE = ftell(fp);
fseek(fp, 0, SEEK_SET);
fclose(fp);
fd = open(FILEPATH, O_RDONLY);
map = (char *) mmap(0, FILESIZE, PROT_READ, MAP_SHARED, fd, 0);
/* Read the file char-by-char from the mmap
*/
char c;
stringstream ss;
for (long i = 0; i <= FILESIZE; ++i) {
c = map[i];
if (c != '\n') {
ss << c;
}
else {
// c style tokenizing
ss.str("");
}
}
if (munmap(map, FILESIZE) == -1) perror("Error un-mmapping the file");
close(fd);
}
为了简洁起见,我省略了很多错误检查。
我的 mmap 实现是否不正确,从而影响性能?也许 mmap 不适合我的应用程序?
感谢您的任何意见或帮助!
I face the challenge of reading/writing files (in Gigs) line by line.
Reading many forum entries and sites (including a bunch of SO's), mmap was suggested as the fastest option to read/write files. However, when I implement my code with both readline and mmap techniques, mmap is the slower of the two. This is true for both reading and writing. I have been testing with files ~600 MB large.
My implementations parse line by line and then tokenize the line. I will present file input only.
Here is the getline implementation:
void two(char* path) {
std::ios::sync_with_stdio(false);
ifstream pFile(path);
string mystring;
if (pFile.is_open()) {
while (getline(pFile,mystring)) {
// c style tokenizing
}
}
else perror("error opening file");
pFile.close();
}
and here is the mmap:
void four(char* path) {
int fd;
char *map;
char *FILEPATH = path;
unsigned long FILESIZE;
// find file size
FILE* fp = fopen(FILEPATH, "r");
fseek(fp, 0, SEEK_END);
FILESIZE = ftell(fp);
fseek(fp, 0, SEEK_SET);
fclose(fp);
fd = open(FILEPATH, O_RDONLY);
map = (char *) mmap(0, FILESIZE, PROT_READ, MAP_SHARED, fd, 0);
/* Read the file char-by-char from the mmap
*/
char c;
stringstream ss;
for (long i = 0; i <= FILESIZE; ++i) {
c = map[i];
if (c != '\n') {
ss << c;
}
else {
// c style tokenizing
ss.str("");
}
}
if (munmap(map, FILESIZE) == -1) perror("Error un-mmapping the file");
close(fd);
}
I omitted much error checking in the interest of brevity.
Is my mmap implementation incorrect, and thus affecting performance? Perhaps mmap is non ideal for my application?
Thanks for any comments or help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
mmap 的真正强大之处在于能够在文件中自由查找,直接使用其内容作为指针,并避免将数据从内核高速缓存复制到用户空间的开销。但是,您的代码示例没有利用这一点。
在循环中,您一次扫描一个字符到缓冲区,并附加到一个
stringstream
。stringstream
不知道字符串有多长,因此必须在此过程中重新分配多次。此时,您已经消除了使用mmap
带来的任何性能提升 - 即使标准 getline 实现也避免了多次重新分配(在 GNU C++ 实现中,通过使用 128 字节的堆栈缓冲区)。如果您想充分利用 mmap:
strnchr
或memchr
)来查找换行符;它们利用手工汇编程序和其他优化来比大多数开放编码的搜索循环运行得更快。The real power of mmap is being able to freely seek in a file, use its contents directly as a pointer, and avoid the overhead of copying data from kernel cache memory to userspace. However, your code sample is not taking advantage of this.
In your loop, you scan the buffer one character at a time, appending to a
stringstream
. Thestringstream
doesn't know how long the string is, and so has to reallocate several times in the process. At this point you've killed off any performance increase from usingmmap
- even the standard getline implementation avoids multiple reallocations (by using a 128-byte on-stack buffer, in the GNU C++ implementation).If you want to use mmap to its fullest power:
strnchr
ormemchr
to find newlines; these make use of hand-rolled assembler and other optimizations to run faster than most open-coded search loops.告诉你使用
mmap
的人对现代机器不太了解。mmap
的性能优势完全是一个神话。用Linus Torvalds 的话来说:mmap
的问题在于,每次您第一次触摸映射区域中的页面时,它都会陷入内核并实际将该页面映射到您的地址空间,从而对 TLB 造成严重破坏。尝试使用
read
一次读取一个大文件8K的简单基准测试,然后再次使用mmap
。 (一遍又一遍地使用相同的 8K 缓冲区。)您几乎肯定会发现读取
实际上更快。您的问题从来不是从内核中获取数据;而是从内核中获取数据。问题在于你之后如何处理数据。尽量减少你一次做的工作;只需扫描以找到换行符,然后对块执行单个操作。就我个人而言,我会回到读取实现,使用(并重新使用)适合 L1 缓存(8K 左右)的缓冲区。
或者至少,我会尝试一个简单的
read
与mmap
基准测试,看看哪个在您的平台上实际上更快。[更新]
我发现了Torvalds先生的多组评论:
http ://lkml.iu.edu/hypermail/linux/kernel/0004.0/0728.html
http://lkml.iu.edu/hypermail/linux/kernel/0004.0 /0775.html
总结:
根据我的经验,顺序读取和处理大文件是“许多情况”之一,其中使用(和重复使用)带有
read
/write
的中等大小的缓冲区性能明显优于mmap
。Whoever told you to use
mmap
does not know very much about modern machines.The performance advantages of
mmap
are a total myth. In the words of Linus Torvalds:The problem with
mmap
is that every time you touch a page in the mapped region for the first time, it traps into the kernel and actually maps the page into your address space, playing havoc with the TLB.Try a simple benchmark reading a big file 8K at a time using
read
and then again withmmap
. (Using the same 8K buffer over and over.) You will almost certainly find thatread
is actually faster.Your problem was never with getting data out of the kernel; it was with how you handle the data after that. Minimize the work you are doing character-at-a-time; just scan to find the newline and then do a single operation on the block. Personally, I would go back to the
read
implementation, using (and re-using) a buffer that fits in the L1 cache (8K or so).Or at least, I would try a simple
read
vs.mmap
benchmark to see which is actually faster on your platform.[Update]
I found a couple more sets of commentary from Mr. Torvalds:
http://lkml.iu.edu/hypermail/linux/kernel/0004.0/0728.html
http://lkml.iu.edu/hypermail/linux/kernel/0004.0/0775.html
The summary:
In my experience, reading and processing a large file sequentially is one of the "many cases" where using (and re-using) a modest-sized buffer with
read
/write
performs significantly better thanmmap
.您可以使用 memchr 来查找行结尾。它比一次向
stringstream
添加一个字符要快得多。You can use
memchr
to find line endings. It will be much faster than adding to astringstream
one character at a time.您正在使用
stringstream
来存储您识别的行。这与 getline 实现无法相比,stringstream 本身增加了开销。正如其他建议的那样,您可以将字符串的开头存储为 char* ,也可以存储行的长度(或指向行末尾的指针)。读取的正文将类似于:另请注意,这更加高效,因为您无需处理每个字符中的任何内容(在您的版本中,您将字符添加到 stringstream 中)。
You're using
stringstream
s to store the lines you identify. This is not comparable with the getline implementation, the stringstream itself adds overhead. As other suggested, you can store the beginning of the string as achar*
, and maybe the length of the line (or a pointer to the end of the line). The body of the read would be something like:Note also that this is much more efficient because you don't process anything in each char (in your version you were adding the character to the
stringstream
).