在 Objective-C 中处理大文本文件的正确方法是什么? 假设我需要单独读取每一行,并希望将每一行视为 NSString。 做到这一点最有效的方法是什么?
一种解决方案是使用 NSString 方法:
+ (id)stringWithContentsOfFile:(NSString *)path
encoding:(NSStringEncoding)enc
error:(NSError **)error
然后使用换行符分隔行,然后迭代数组中的元素。 然而,这似乎效率相当低。 是否没有简单的方法将文件视为流,枚举每一行,而不是一次读取全部内容? 有点像 Java 的 java.io.BufferedReader。
What is the appropriate way of dealing with large text files in Objective-C? Let's say I need to read each line separately and want to treat each line as an NSString. What is the most efficient way of doing this?
One solution is using the NSString method:
+ (id)stringWithContentsOfFile:(NSString *)path
encoding:(NSStringEncoding)enc
error:(NSError **)error
and then split the lines with a newline separator, and then iterate over the elements in the array. However, this seems fairly inefficient. Is there no easy way to treat the file as a stream, enumerating over each line, instead of just reading it all in at once? Kinda like Java's java.io.BufferedReader.
发布评论
评论(18)
正如其他人回答的那样, NSInputStream 和 NSFileHandle 都是不错的选择,但它也可以通过 NSData 和内存映射以相当紧凑的方式完成:
BRLineReader.h
BRLineReader.m
As others have answered both NSInputStream and NSFileHandle are fine options, but it can also be done in a fairly compact way with NSData and memory mapping:
BRLineReader.h
BRLineReader.m
这个答案不是 ObjC 而是 C。
既然 ObjC 是基于“C”的,为什么不使用 fgets 呢?
是的,我确信 ObjC 有它自己的方法 - 我只是还不够熟练,还不知道它是什么:)
This answer is NOT ObjC but C.
Since ObjC is 'C' based, why not use fgets?
And yes, I'm sure ObjC has it's own method - I'm just not proficient enough yet to know what it is :)
我看到很多答案都依赖于将整个文本文件读入内存,而不是一次读取一大块。 这是我在现代 Swift 中的解决方案,使用 FileHandle 来保持较低的内存影响:
请注意,这会保留行末尾的回车符,因此根据您的需要,您可能需要调整代码以将其删除。
用法:只需打开目标文本文件的文件句柄,然后以合适的最大长度调用 readLine - 1024 是纯文本的标准长度,但我将其保持打开状态,以防您知道它会更短。 请注意,该命令不会溢出文件末尾,因此如果您打算解析整个文件,则可能需要手动检查是否未到达文件末尾。 下面是一些示例代码,展示了如何打开
myFileURL
处的文件并逐行读取该文件直至结束。I see a lot of these answers rely on reading the whole text file into memory instead of taking it one chunk at a time. Here's my solution in nice modern Swift, using FileHandle to keep memory impact low:
Note that this preserves the carriage return at the end of the line, so depending on your needs you may want to adjust the code to remove it.
Usage: simply open a file handle to your target text file and call
readLine
with a suitable maximum length - 1024 is standard for plain text, but I left it open in case you know it will be shorter. Note that the command will not overflow the end of the file, so you may have to check manually that you've not reached it if you intend to parse the entire thing. Here's some sample code that shows how to open a file atmyFileURL
and read it line-by-line until the end.根据 @Adam Rosenfield 的回答,
fscanf
的格式字符串将更改如下:它将在 osx、linux、windows 行结尾中工作。
from @Adam Rosenfield's answer, the formatting string of
fscanf
would be changed like below:it will work in osx, linux, windows line endings.
使用类别或扩展让我们的生活更轻松一些。
Using category or extension to make our life a bit easier.
我发现 @lukaswelte 的回复和 Dave DeLong 的代码非常有帮助。 我正在寻找此问题的解决方案,但需要通过
\r\n
解析大文件,而不仅仅是\n
。如果解析多个字符,所编写的代码将包含一个错误。 我已将代码更改如下。
.h 文件:
.m 文件:
I found response by @lukaswelte and code from Dave DeLong very helpful. I was looking for a solution to this problem but needed to parse large files by
\r\n
not just\n
.The code as written contains a bug if parsing by more than one character. I've changed the code as below.
.h file:
.m file:
我添加这一点是因为我尝试过的所有其他答案都以某种方式达不到要求。 以下方法可以处理大文件、任意长行以及空行。 它已经用实际内容进行了测试,并将从输出中删除换行符。
感谢@Adam Rosenfield 和@sooop
I am adding this because all other answers I tried fell short one way or another. The following method can handle large files, arbitrary long lines, as well as empty lines. It has been tested with actual content and will strip out newline character from the output.
Credit goes to @Adam Rosenfield and @sooop
这是我用于较小文件的一个很好的简单解决方案:
Here's a nice simple solution i use for smaller files:
使用这个脚本,效果很好:
Use this script, it works great:
这适用于从
Text
读取String
的一般情况。如果你想阅读更长的文本(大尺寸的文本),那么使用这里其他人提到的方法,例如缓冲(在内存空间中保留文本的大小)< /em>.
假设您阅读了一个文本文件。
你想摆脱新线。
你有它。
This will work for general reading a
String
fromText
.If you would like to read longer text (large size of text), then use the method that other people here were mentioned such as buffered (reserve the size of the text in memory space).
Say you read a Text File.
You want to get rid of new line.
There you have it.
这是一个很好的问题。 我认为 @Diederik 有一个很好的答案,尽管不幸的是 Cocoa 没有一个机制来实现您想要做的事情。
NSInputStream
允许您阅读N 个字节的块(与 java.io.BufferedReader 非常相似),但是您必须自己将其转换为 NSString,然后扫描换行符(或其他任何内容)分隔符)并保存所有剩余字符以供下次读取,或者如果尚未读取换行符则读取更多字符。 (NSFileHandle
让您阅读一个NSData
,然后您可以将其转换为NSString
,但本质上是相同的过程。)Apple 有一个 Stream 编程指南,可以帮助填写详细信息,以及 如果您要处理
uint8_t*
缓冲区,这个问题也可能会有所帮助。如果您要经常读取这样的字符串(尤其是在程序的不同部分),最好将此行为封装在一个可以为您处理详细信息的类中,甚至子类化
NSInputStream< /code> (它是设计为子类化)并添加允许您准确阅读所需内容的方法。
根据记录,我认为这将是一个很好的添加功能,我将提交一个增强请求,以实现这一点。 :-)
编辑:原来这个请求已经存在。 有一个 2006 年推出的 Radar(对于 Apple 内部人员来说是 rdar://4742914)。
That's a great question. I think @Diederik has a good answer, although it's unfortunate that Cocoa doesn't have a mechanism for exactly what you want to do.
NSInputStream
allows you to read chunks of N bytes (very similar tojava.io.BufferedReader
), but you have to convert it to anNSString
on your own, then scan for newlines (or whatever other delimiter) and save any remaining characters for the next read, or read more characters if a newline hasn't been read yet. (NSFileHandle
lets you read anNSData
which you can then convert to anNSString
, but it's essentially the same process.)Apple has a Stream Programming Guide that can help fill in the details, and this SO question may help as well if you're going to be dealing with
uint8_t*
buffers.If you're going to be reading strings like this frequently (especially in different parts of your program) it would be a good idea to encapsulate this behavior in a class that can handle the details for you, or even subclassing
NSInputStream
(it's designed to be subclassed) and adding methods that allow you to read exactly what you want.For the record, I think this would be a nice feature to add, and I'll be filing an enhancement request for something that makes this possible. :-)
Edit: Turns out this request already exists. There's a Radar dating from 2006 for this (rdar://4742914 for Apple-internal people).
这应该可以解决问题:
使用如下:
此代码从文件中读取非换行符,一次最多 4095 个。 如果一行长度超过 4095 个字符,它将继续读取,直到遇到换行符或文件结尾。
注意:我还没有测试过这段代码。 使用前请先测试一下。
This should do the trick:
Use as follows:
This code reads non-newline characters from the file, up to 4095 at a time. If you have a line that is longer than 4095 characters, it keeps reading until it hits a newline or end-of-file.
Note: I have not tested this code. Please test it before using it.
Mac OS X 是 Unix,Objective-C 是 C 超集,因此您可以使用
fopen
和fgets
代码>. 它保证可以工作。[NSString stringWithUTF8String:buf]
会将 C 字符串转换为NSString
。 还有一些方法可以用其他编码创建字符串,并且无需复制即可创建。Mac OS X is Unix, Objective-C is C superset, so you can just use old-school
fopen
andfgets
from<stdio.h>
. It's guaranteed to work.[NSString stringWithUTF8String:buf]
will convert C string toNSString
. There are also methods for creating strings in other encodings and creating without copying.您可以使用 NSInputStream ,它具有文件流的基本实现。 您可以将字节读入缓冲区(
read:maxLength:
方法)。 您必须自己扫描缓冲区中的换行符。You can use
NSInputStream
which has a basic implementation for file streams. You can read bytes into a buffer (read:maxLength:
method). You have to scan the buffer for newlines yourself.Apple 的字符串编程指南中记录了在 Cocoa/Objective-C 中读取文本文件的适当方法。 读取和写入文件部分应该是什么你在追赶。 PS:什么是“线”? 字符串的两个部分用“\n”分隔? 还是“\r”? 还是“\r\n”? 或者也许你实际上是在追寻段落? 前面提到的指南还包括有关将字符串拆分为行或段落的部分。 (这一节称为“段落和换行符”,链接到我在上面指出的页面的左侧菜单中。不幸的是,该网站不允许我发布多个 URL,因为我还不是值得信赖的用户。)
套用 Knuth 的话:过早的优化是万恶之源。 不要简单地假设“将整个文件读入内存”很慢。 你对它进行了基准测试吗? 您是否知道它实际上将整个文件读取到内存中? 也许它只是返回一个代理对象,并在您使用字符串时在幕后继续读取? (免责声明:我不知道 NSString 是否真的做到了这一点。可以想象它可以。)重点是:首先采用记录的做事方式。 然后,如果基准测试表明这没有达到您想要的性能,请进行优化。
The appropriate way to read text files in Cocoa/Objective-C is documented in Apple's String programming guide. The section for reading and writing files should be just what you're after. PS: What's a "line"? Two sections of a string separated by "\n"? Or "\r"? Or "\r\n"? Or maybe you're actually after paragraphs? The previously mentioned guide also includes a section on splitting a string into lines or paragraphs. (This section is called "Paragraphs and Line Breaks", and is linked to in the left-hand-side menu of the page I pointed to above. Unfortunately this site doesn't allow me to post more than one URL as I'm not a trustworthy user yet.)
To paraphrase Knuth: premature optimisation is the root of all evil. Don't simply assume that "reading the whole file into memory" is slow. Have you benchmarked it? Do you know that it actually reads the whole file into memory? Maybe it simply returns a proxy object and keeps reading behind the scenes as you consume the string? (Disclaimer: I have no idea if NSString actually does this. It conceivably could.) The point is: first go with the documented way of doing things. Then, if benchmarks show that this doesn't have the performance you desire, optimise.
其中很多答案都是长代码块,或者它们读取整个文件。 我喜欢使用 c 方法来完成这项任务。
请注意, fgetln 不会保留换行符。 另外,我们将 str 的长度+1,因为我们想为 NULL 终止留出空间。
A lot of these answers are long chunks of code or they read in the entire file. I like to use the c methods for this very task.
Note that fgetln will not keep your newline character. Also, We +1 the length of the str because we want to make space for the NULL termination.
正如@porneL 所说,C api 非常方便。
Just like @porneL said, the C api is very handy.
可以通过以下函数来逐行读取文件(也适用于极大的文件):
或者:
启用此功能的 DDFileReader 类如下:
接口文件 (.h):
实现 (.m)
该课程由 Dave DeLong 完成
To read a file line by line (also for extreme big files) can be done by the following functions:
Or:
The class DDFileReader that enables this is the following:
Interface File (.h):
Implementation (.m)
The class was done by Dave DeLong