给定一个 FILE * ,如何找到第一次出现“abc”的偏移量有效率吗?
如何在C中高效地完成此类工作?
我能想到的是首先将整个文件加载到内存中,然后搜索它。
但是有没有更有效的方法?
更新
如果文件非常大,则无法将整个文件加载到内存中。
How to do this kind of job efficiently in C?
What I can think of is first load the whole file into memory and then search though it..
But is there a more efficient way?
UPDATE
To load the whole file into memory will be impossible if the file is extremely big.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
将整个文件加载到内存中是不必要的且效率低下。尝试这样的事情:
显然你永远不会真正使用这样的代码。您应该编写一个函数来搜索任意字符串,但算法基本相同。此外,I/O 将由系统缓冲,因此您不必担心一次读取单个字符的效率。我也没有包括任何错误检查。
Loading the whole file into memory is unnecessary and inefficient. Try something like this:
Obviously you would never actually use code like this. You should write a function that takes an arbitrary string to search for, but the algorithm is basically the same. Also the I/O will be buffered by the system so you shouldn't have to worry about efficiency of reading a single character at a time. Also I didn't include any error checking.
您可以逐块读取文件并在每个块中搜索“abc”。有像 Boyer-Moore 搜索这样的算法可以减少必须显式检查的字符数。
在 Linux 中,您可以使用 posix_fadvise 告诉它您将读取该文件。
you can read in the file block-by-block and search for "abc" in each block. There are algorithms like the Boyer-Moore search to reduce the number of characters you have to explicitly check.
in linux, you can use
posix_fadvise
to tell it that you will be slurping the file.对于字符串搜索有许多有趣的算法。例如,在 Boyer-Moore 中,如果您要匹配“abc”,则第三个位置必须是“c”,如果它不是“c”,那么您将利用这样一个事实:不是,那么表格将说出要前进多远(例如,如果它是“d”,您可以向前跳 3 个,因为前 3 个字母对您来说根本不感兴趣)。
然而,与读取文件所花费的时间相比,有趣的字符串搜索方法根本不重要。如果您想处理任意文件,您应该避免将其全部读入,因为额外的内存使用是浪费并且会减慢您的速度。但是您无法避免读取整个文件,直到找到字符串为止。
For string searches there are many interesting algorithms. For example in Boyer-Moore you would take advantage of the fact that the 3rd position must be 'c' if you are to match 'abc', and if it is not 'c' then a table would say how far to advance (for example if it's 'd' you can skip ahead 3 because the first 3 letters can't be interesting to you at all).
However, interesting string search methods are not going to matter at all versus the time spent reading the file. You should avoid reading it all in if you want to handle arbitrary files because the extra memory use is wasteful and will slow you down. But you can't avoid reading the entire file up to the point where you find your string.
您使用什么操作系统?如果是 Linux,您可以使用 内存映射 自动将内存的特定部分直接映射到文件。它被认为要快得多。
编辑
mmap 不会立即将整个文件加载到内存中。它只是更有效率。
What OS are you using? If it's Linux you can use a memory map to automatically map a certain portion of memory directly to the file. It's considered much faster.
EDIT
mmap doesn't load the whole file into memory at once. It's just more efficient.