提取“具有语言意义”的内容RTF 文件中的字符
我编写了一个 Mac 应用程序,可以交叉引用各种输入文本和 RTF 文件来生成输出文件。该应用程序的一部分读取这些文件并从 TXT 或 RTF 文件中提取“具有语言意义”的字符,然后将其释放以供进一步处理。
我正在使用以下方法。它工作得很好,但我想知道我是否要么走了很长的路,要么做了一些完全不必要的事情。
inputdatafile = [NSString stringWithContentsOfFile: fullpath encoding: NSASCIIStringEncoding error:&error];
// test rtf wrapper code right here //
inputdataNSData=[inputdatafile dataUsingEncoding:NSUTF8StringEncoding];
wrapper = [[NSFileWrapper alloc] initRegularFileWithContents:inputdataNSData];
rtfData = [[NSAttributedString alloc]
initWithRTF:[wrapper regularFileContents] documentAttributes:nil];
inputdatafilefromrtf = [rtfData string];
if (inputdatafilefromrtf) {
inputdatafile = [NSMutableString stringWithString:inputdatafilefromrtf];};
inputdatafile 加载了文件的内容。该程序不知道它是什么类型的文本文件,并尝试查看它是否是 RTF。如果是,它会提取文件的内容以进行进一步处理。如果不是,它会假设它是纯文本并使用它。
这可能是完全意外的,需要更新,或者也许有更好的方法来做到这一点。
任何人有任何想法将不胜感激。
I have written a Mac app that cross references various input text and RTF files to produce output files. Part of that app reads in these files and extracts the 'linguistically significant' characters from either TXT or RTF files and releases them for further processing.
I am using the following method for this. It works fine but I am wondering if perhaps I am either taking the long way around or doing something totally unnecessary.
inputdatafile = [NSString stringWithContentsOfFile: fullpath encoding: NSASCIIStringEncoding error:&error];
// test rtf wrapper code right here //
inputdataNSData=[inputdatafile dataUsingEncoding:NSUTF8StringEncoding];
wrapper = [[NSFileWrapper alloc] initRegularFileWithContents:inputdataNSData];
rtfData = [[NSAttributedString alloc]
initWithRTF:[wrapper regularFileContents] documentAttributes:nil];
inputdatafilefromrtf = [rtfData string];
if (inputdatafilefromrtf) {
inputdatafile = [NSMutableString stringWithString:inputdatafilefromrtf];};
inputdatafile is loaded with the contents of a file. The program does not know what sort of text file it is and tries to see if it is RTF. If so, it extracts the contents of the file for further processing. If not, it assumes it is straight text and uses that.
It is possible that this is working totally accidentally and needs to be updated or perhaps there are better ways to do this.
Any thoughts that anybody has out there would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以尝试使用 RTF magic number< 来识别文件类型/a> 就像 unix 命令 file 一样,或者您可以使用像 libenca。
以下是文件解析和幻数的一般说明。
You could try identifying the type of file using RTFs magic number like the unix command file does, or you could use a library like libenca.
Here's a general explanation of file parsing and magic numbers.