使用 MemoryMappedFile 对大型文本文件执行搜索是否有意义?
我的任务是实现一个搜索功能,该功能将搜索几个大型(几 MB)日志文件并返回包含关键字的行。日志文件不断添加到池中,因此每次搜索都必须是动态的。
为每个文件创建一个 MemoryMappedFile 然后迭代是否有意义每行,匹配关键字?如果没有,更好的方法是什么?
任何示例代码的链接将不胜感激。
I'm tasked with implementing a search function that will search through several large (couple MB) log files and return the lines that contain the keywords. Log files are constantly being added to the pool so the search has to be dynamic every time.
Would it make sense to create a MemoryMappedFile for each file and then iterate through each line, matching the keywords? If not, what would be a better way to go about it?
Any links to example code would be much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的。 “几 MB”并不是很多,2 GB 就可以轻松容纳。
您需要使用采用映射大小的构造函数,因为文件会随着时间的推移而增长。另外,我认为您需要在每次搜索时重新创建访问器或流,但我发现 MSDN 在这里有点不清楚。
使用 Stream,创建 StreamReader 并读取每一行都很简单。整个过程很可能在合理的硬件上受到 I/O 限制,因此最初不要为 CPU 优化而烦恼。
Yes. A "couple of MB" is not very much, it easily fits in 2 GB.
You'll want to use the constructor that takes a mapping size because the file will grow in time. Also, I think you'll need to recreate the Accessor or Stream on each search, but I find MSDN a bit unclear here.
With a Stream, it's trivial to create a StreamReader, and read every line. The whole process is very likely I/O bound on reasonable hardware, so don't bother with CPU optimizations initially.
为什么不在内存中创建一个结构正确的索引对象树,并针对搜索进行优化?
编辑:在一些评论后添加...
可能是这样的:
Why not just create a properly structured index object tree in memory, optimized for searching?
EDIT: Added after some comments...
Could be something like this: