如何使用quartz搜索pdf文档中的文本
我正在使用quartz 来显示pdf。我需要获取搜索文本所在页面的索引。有人可以帮助我吗?谢谢。
解决方案: 有一个代码示例,用于从页面中提取文本并检查其序列。
#import <Foundation/Foundation.h>
@interface PDFSearcher : NSObject {
CGPDFOperatorTableRef table;
NSMutableString *currentData;
}
@property (nonatomic, retain) NSMutableString * currentData;
-(id)init;
-(BOOL)page:(CGPDFPageRef)inPage containsString:(NSString *)inSearchString;
@end
#import "PDFSearcher.h"
@implementation PDFSearcher
@synthesize currentData;
void arrayCallback(CGPDFScannerRef inScanner, void *userInfo)
{
PDFSearcher * searcher = (PDFSearcher *)userInfo;
CGPDFArrayRef array;
bool success = CGPDFScannerPopArray(inScanner, &array);
for(size_t n = 0; n < CGPDFArrayGetCount(array); n += 2)
{
if(n >= CGPDFArrayGetCount(array))
continue;
CGPDFStringRef string;
success = CGPDFArrayGetString(array, n, &string);
if(success)
{
NSString *data = (NSString *)CGPDFStringCopyTextString(string);
[searcher.currentData appendFormat:@"%@", data];
[data release];
}
}
}
void stringCallback(CGPDFScannerRef inScanner, void *userInfo)
{
PDFSearcher *searcher = (PDFSearcher *)userInfo;
CGPDFStringRef string;
bool success = CGPDFScannerPopString(inScanner, &string);
if(success)
{
NSString *data = (NSString *)CGPDFStringCopyTextString(string);
[searcher.currentData appendFormat:@"%@", data];
[data release];
}
}
-(id)init
{
if(self = [super init])
{
table = CGPDFOperatorTableCreate();
CGPDFOperatorTableSetCallback(table, "TJ", arrayCallback);
CGPDFOperatorTableSetCallback(table, "Tj", stringCallback);
}
return self;
}
-(BOOL)page:(CGPDFPageRef)inPage containsString:(NSString *)inSearchString
{
[self setCurrentData:[NSMutableString string]];
CGPDFContentStreamRef contentStream = CGPDFContentStreamCreateWithPage(inPage);
CGPDFScannerRef scanner = CGPDFScannerCreate(contentStream, table, self);
bool ret = CGPDFScannerScan(scanner);
CGPDFScannerRelease(scanner);
CGPDFContentStreamRelease(contentStream);
//NSLog(@"%u, %@", [self.currentData length], self.currentData);
return ([[self.currentData uppercaseString]
rangeOfString:[inSearchString uppercaseString]].location != NSNotFound);
}
@end
I'm using quartz to display pdf. I need to get the indexes of pages where my searching text exists. Anyone can help me? Thanks.
Solution:
There is a sample of code that extracts a text from the page and check it for the sequences.
#import <Foundation/Foundation.h>
@interface PDFSearcher : NSObject {
CGPDFOperatorTableRef table;
NSMutableString *currentData;
}
@property (nonatomic, retain) NSMutableString * currentData;
-(id)init;
-(BOOL)page:(CGPDFPageRef)inPage containsString:(NSString *)inSearchString;
@end
#import "PDFSearcher.h"
@implementation PDFSearcher
@synthesize currentData;
void arrayCallback(CGPDFScannerRef inScanner, void *userInfo)
{
PDFSearcher * searcher = (PDFSearcher *)userInfo;
CGPDFArrayRef array;
bool success = CGPDFScannerPopArray(inScanner, &array);
for(size_t n = 0; n < CGPDFArrayGetCount(array); n += 2)
{
if(n >= CGPDFArrayGetCount(array))
continue;
CGPDFStringRef string;
success = CGPDFArrayGetString(array, n, &string);
if(success)
{
NSString *data = (NSString *)CGPDFStringCopyTextString(string);
[searcher.currentData appendFormat:@"%@", data];
[data release];
}
}
}
void stringCallback(CGPDFScannerRef inScanner, void *userInfo)
{
PDFSearcher *searcher = (PDFSearcher *)userInfo;
CGPDFStringRef string;
bool success = CGPDFScannerPopString(inScanner, &string);
if(success)
{
NSString *data = (NSString *)CGPDFStringCopyTextString(string);
[searcher.currentData appendFormat:@"%@", data];
[data release];
}
}
-(id)init
{
if(self = [super init])
{
table = CGPDFOperatorTableCreate();
CGPDFOperatorTableSetCallback(table, "TJ", arrayCallback);
CGPDFOperatorTableSetCallback(table, "Tj", stringCallback);
}
return self;
}
-(BOOL)page:(CGPDFPageRef)inPage containsString:(NSString *)inSearchString
{
[self setCurrentData:[NSMutableString string]];
CGPDFContentStreamRef contentStream = CGPDFContentStreamCreateWithPage(inPage);
CGPDFScannerRef scanner = CGPDFScannerCreate(contentStream, table, self);
bool ret = CGPDFScannerScan(scanner);
CGPDFScannerRelease(scanner);
CGPDFContentStreamRelease(contentStream);
//NSLog(@"%u, %@", [self.currentData length], self.currentData);
return ([[self.currentData uppercaseString]
rangeOfString:[inSearchString uppercaseString]].location != NSNotFound);
}
@end
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用CGPDFDocument、CGPDFPage和CGPDFScanner扫描页面内容并将其解析为NSString。
然后使用 NSString 函数查找该页面上的文本。如果存在,则将相应的页码存储在某个数组中。在 for 循环中重复扫描和解析 pdf 中的页数
Use CGPDFDocument, CGPDFPage and CGPDFScanner to scan and parse the contents of the page into NSString.
Then use NSString function to find the text on that page. If it exists store the corresponding pagenumber in some array. Repeat this scanning and parsing in for loop for number of pages in the pdf
http://www.random-ideas.net/posts/42%22
查看上面的链接其工作原理。
http://www.random-ideas.net/posts/42%22
check out the above link its working.
在 Quartz 内部没有什么可以做的。 Quartz 用于图形显示 - 它不需要知道或关心在 PDF 中搜索字符串匹配。您必须使用 Core Graphics PDF 解析方法来提取数据,自己搜索字符串,然后获取它所在的页面。
There's nothing to do this inside of Quartz. Quartz is for graphics display - it doesn't need to know, or care about, searching a PDF for string matches. You will have to use the Core Graphics PDF parsing methods to pull out the data, search for the string yourself, and then get the page it occurs on.
如果您使用
PDFDocument
,而不是CGPDFDocument
,则该 API 具有文本搜索操作,例如findString:withOptions
If you use
PDFDocument
, instead ofCGPDFDocument
, that API has text search operations, such asfindString:withOptions