提取“具有语言意义”的内容RTF 文件中的字符

发布于 2024-12-27 06:43:07 字数 954 浏览 5 评论 0原文

我编写了一个 Mac 应用程序，可以交叉引用各种输入文本和 RTF 文件来生成输出文件。该应用程序的一部分读取这些文件并从 TXT 或 RTF 文件中提取“具有语言意义”的字符，然后将其释放以供进一步处理。

我正在使用以下方法。它工作得很好，但我想知道我是否要么走了很长的路，要么做了一些完全不必要的事情。

  inputdatafile = [NSString stringWithContentsOfFile: fullpath encoding: NSASCIIStringEncoding error:&error];

   // test rtf wrapper code right here //
   inputdataNSData=[inputdatafile dataUsingEncoding:NSUTF8StringEncoding];
   wrapper = [[NSFileWrapper alloc] initRegularFileWithContents:inputdataNSData];
   rtfData = [[NSAttributedString alloc]
           initWithRTF:[wrapper regularFileContents] documentAttributes:nil]; 
   inputdatafilefromrtf = [rtfData string];
   if (inputdatafilefromrtf) {
      inputdatafile = [NSMutableString stringWithString:inputdatafilefromrtf];};

inputdatafile 加载了文件的内容。该程序不知道它是什么类型的文本文件，并尝试查看它是否是 RTF。如果是，它会提取文件的内容以进行进一步处理。如果不是，它会假设它是纯文本并使用它。

这可能是完全意外的，需要更新，或者也许有更好的方法来做到这一点。

任何人有任何想法将不胜感激。

原文

I have written a Mac app that cross references various input text and RTF files to produce output files. Part of that app reads in these files and extracts the 'linguistically significant' characters from either TXT or RTF files and releases them for further processing.

I am using the following method for this. It works fine but I am wondering if perhaps I am either taking the long way around or doing something totally unnecessary.

  inputdatafile = [NSString stringWithContentsOfFile: fullpath encoding: NSASCIIStringEncoding error:&error];

   // test rtf wrapper code right here //
   inputdataNSData=[inputdatafile dataUsingEncoding:NSUTF8StringEncoding];
   wrapper = [[NSFileWrapper alloc] initRegularFileWithContents:inputdataNSData];
   rtfData = [[NSAttributedString alloc]
           initWithRTF:[wrapper regularFileContents] documentAttributes:nil]; 
   inputdatafilefromrtf = [rtfData string];
   if (inputdatafilefromrtf) {
      inputdatafile = [NSMutableString stringWithString:inputdatafilefromrtf];};

inputdatafile is loaded with the contents of a file. The program does not know what sort of text file it is and tries to see if it is RTF. If so, it extracts the contents of the file for further processing. If not, it assumes it is straight text and uses that.

It is possible that this is working totally accidentally and needs to be updated or perhaps there are better ways to do this.

Any thoughts that anybody has out there would be greatly appreciated.

分享到QQ

分享到微博