使用 MemoryMappedFile 对大型文本文件执行搜索是否有意义?

发布于 2024-11-24 23:43:28 字数 253 浏览 1 评论 0原文

我的任务是实现一个搜索功能,该功能将搜索几个大型(几 MB)日志文件并返回包含关键字的行。日志文件不断添加到池中,因此每次搜索都必须是动态的。

为每个文件创建一个 MemoryMappedFile 然后迭代是否有意义每行,匹配关键字?如果没有,更好的方法是什么?

任何示例代码的链接将不胜感激。

I'm tasked with implementing a search function that will search through several large (couple MB) log files and return the lines that contain the keywords. Log files are constantly being added to the pool so the search has to be dynamic every time.

Would it make sense to create a MemoryMappedFile for each file and then iterate through each line, matching the keywords? If not, what would be a better way to go about it?

Any links to example code would be much appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

甜中书 2024-12-01 23:43:28

是的。 “几 MB”并不是很多,2 GB 就可以轻松容纳。

您需要使用采用映射大小的构造函数,因为文件会随着时间的推移而增长。另外,我认为您需要在每次搜索时重新创建访问器或流,但我发现 MSDN 在这里有点不清楚。

使用 Stream,创建 StreamReader 并读取每一行都很简单。整个过程很可能在合理的硬件上受到 I/O 限制,因此最初不要为 CPU 优化而烦恼。

Yes. A "couple of MB" is not very much, it easily fits in 2 GB.

You'll want to use the constructor that takes a mapping size because the file will grow in time. Also, I think you'll need to recreate the Accessor or Stream on each search, but I find MSDN a bit unclear here.

With a Stream, it's trivial to create a StreamReader, and read every line. The whole process is very likely I/O bound on reasonable hardware, so don't bother with CPU optimizations initially.

羁绊已千年 2024-12-01 23:43:28

为什么不在内存中创建一个结构正确的索引对象树,并针对搜索进行优化?

编辑:在一些评论后添加...

可能是这样的:

class Index
{
    public Dictionary<string, List<SourceFile>> FilesThatContainThisWord {get; set;}
    ...
}


class SourceFile
{
    public string Path {get; set;}
    ...
}


// Code to look up a term
var filesThatContainMonday = myIndex.FilesThatContainThisWord["Monday"];

Why not just create a properly structured index object tree in memory, optimized for searching?

EDIT: Added after some comments...

Could be something like this:

class Index
{
    public Dictionary<string, List<SourceFile>> FilesThatContainThisWord {get; set;}
    ...
}


class SourceFile
{
    public string Path {get; set;}
    ...
}


// Code to look up a term
var filesThatContainMonday = myIndex.FilesThatContainThisWord["Monday"];
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文