如何使用quartz搜索pdf文档中的文本

发布于 2024-10-12 19:04:26 字数 2449 浏览 2 评论 0原文

我正在使用quartz 来显示pdf。我需要获取搜索文本所在页面的索引。有人可以帮助我吗?谢谢。

解决方案: 有一个代码示例,用于从页面中提取文本并检查其序列。

#import <Foundation/Foundation.h>

@interface PDFSearcher : NSObject {
    CGPDFOperatorTableRef table;
    NSMutableString *currentData;
}

@property (nonatomic, retain) NSMutableString * currentData;
-(id)init;
-(BOOL)page:(CGPDFPageRef)inPage containsString:(NSString *)inSearchString;

@end

#import "PDFSearcher.h"

@implementation PDFSearcher
@synthesize currentData;
void arrayCallback(CGPDFScannerRef inScanner, void *userInfo)
{
    PDFSearcher * searcher = (PDFSearcher *)userInfo;

    CGPDFArrayRef array;

    bool success = CGPDFScannerPopArray(inScanner, &array);

    for(size_t n = 0; n < CGPDFArrayGetCount(array); n += 2)
    {
        if(n >= CGPDFArrayGetCount(array))
            continue;

        CGPDFStringRef string;
        success = CGPDFArrayGetString(array, n, &string);
        if(success)
        {
            NSString *data = (NSString *)CGPDFStringCopyTextString(string);
            [searcher.currentData appendFormat:@"%@", data];
            [data release];
        }
    }
}

void stringCallback(CGPDFScannerRef inScanner, void *userInfo)
{
    PDFSearcher *searcher = (PDFSearcher *)userInfo;

    CGPDFStringRef string;

    bool success = CGPDFScannerPopString(inScanner, &string);

    if(success)
    {
        NSString *data = (NSString *)CGPDFStringCopyTextString(string);
        [searcher.currentData appendFormat:@"%@", data];
        [data release];

    }
}

-(id)init
{
    if(self = [super init])
    {
        table = CGPDFOperatorTableCreate();
        CGPDFOperatorTableSetCallback(table, "TJ", arrayCallback);
        CGPDFOperatorTableSetCallback(table, "Tj", stringCallback);
    }
    return self;
}

-(BOOL)page:(CGPDFPageRef)inPage containsString:(NSString *)inSearchString
{
    [self setCurrentData:[NSMutableString string]];
    CGPDFContentStreamRef contentStream = CGPDFContentStreamCreateWithPage(inPage);
    CGPDFScannerRef scanner = CGPDFScannerCreate(contentStream, table, self);
    bool ret = CGPDFScannerScan(scanner);
    CGPDFScannerRelease(scanner);
    CGPDFContentStreamRelease(contentStream);
    //NSLog(@"%u, %@", [self.currentData length], self.currentData);
    return ([[self.currentData uppercaseString] 
             rangeOfString:[inSearchString uppercaseString]].location != NSNotFound);
}
@end

I'm using quartz to display pdf. I need to get the indexes of pages where my searching text exists. Anyone can help me? Thanks.

Solution:
There is a sample of code that extracts a text from the page and check it for the sequences.

#import <Foundation/Foundation.h>

@interface PDFSearcher : NSObject {
    CGPDFOperatorTableRef table;
    NSMutableString *currentData;
}

@property (nonatomic, retain) NSMutableString * currentData;
-(id)init;
-(BOOL)page:(CGPDFPageRef)inPage containsString:(NSString *)inSearchString;

@end

#import "PDFSearcher.h"

@implementation PDFSearcher
@synthesize currentData;
void arrayCallback(CGPDFScannerRef inScanner, void *userInfo)
{
    PDFSearcher * searcher = (PDFSearcher *)userInfo;

    CGPDFArrayRef array;

    bool success = CGPDFScannerPopArray(inScanner, &array);

    for(size_t n = 0; n < CGPDFArrayGetCount(array); n += 2)
    {
        if(n >= CGPDFArrayGetCount(array))
            continue;

        CGPDFStringRef string;
        success = CGPDFArrayGetString(array, n, &string);
        if(success)
        {
            NSString *data = (NSString *)CGPDFStringCopyTextString(string);
            [searcher.currentData appendFormat:@"%@", data];
            [data release];
        }
    }
}

void stringCallback(CGPDFScannerRef inScanner, void *userInfo)
{
    PDFSearcher *searcher = (PDFSearcher *)userInfo;

    CGPDFStringRef string;

    bool success = CGPDFScannerPopString(inScanner, &string);

    if(success)
    {
        NSString *data = (NSString *)CGPDFStringCopyTextString(string);
        [searcher.currentData appendFormat:@"%@", data];
        [data release];

    }
}

-(id)init
{
    if(self = [super init])
    {
        table = CGPDFOperatorTableCreate();
        CGPDFOperatorTableSetCallback(table, "TJ", arrayCallback);
        CGPDFOperatorTableSetCallback(table, "Tj", stringCallback);
    }
    return self;
}

-(BOOL)page:(CGPDFPageRef)inPage containsString:(NSString *)inSearchString
{
    [self setCurrentData:[NSMutableString string]];
    CGPDFContentStreamRef contentStream = CGPDFContentStreamCreateWithPage(inPage);
    CGPDFScannerRef scanner = CGPDFScannerCreate(contentStream, table, self);
    bool ret = CGPDFScannerScan(scanner);
    CGPDFScannerRelease(scanner);
    CGPDFContentStreamRelease(contentStream);
    //NSLog(@"%u, %@", [self.currentData length], self.currentData);
    return ([[self.currentData uppercaseString] 
             rangeOfString:[inSearchString uppercaseString]].location != NSNotFound);
}
@end

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

与往事干杯 2024-10-19 19:04:27

使用CGPDFDocument、CGPDFPage和CGPDFScanner扫描页面内容并将其解析为NSString。
然后使用 NSString 函数查找该页面上的文本。如果存在,则将相应的页码存储在某个数组中。在 for 循环中重复扫描和解析 pdf 中的页数

Use CGPDFDocument, CGPDFPage and CGPDFScanner to scan and parse the contents of the page into NSString.
Then use NSString function to find the text on that page. If it exists store the corresponding pagenumber in some array. Repeat this scanning and parsing in for loop for number of pages in the pdf

浴红衣 2024-10-19 19:04:27

http://www.random-ideas.net/posts/42%22

查看上面的链接其工作原理。

http://www.random-ideas.net/posts/42%22

check out the above link its working.

风透绣罗衣 2024-10-19 19:04:27

在 Quartz 内部没有什么可以做的。 Quartz 用于图形显示 - 它不需要知道或关心在 PDF 中搜索字符串匹配。您必须使用 Core Graphics PDF 解析方法来提取数据,自己搜索字符串,然后获取它所在的页面。

There's nothing to do this inside of Quartz. Quartz is for graphics display - it doesn't need to know, or care about, searching a PDF for string matches. You will have to use the Core Graphics PDF parsing methods to pull out the data, search for the string yourself, and then get the page it occurs on.

万水千山粽是情ミ 2024-10-19 19:04:27

如果您使用 PDFDocument,而不是 CGPDFDocument,则该 API 具有文本搜索操作,例如 findString:withOptions

If you use PDFDocument, instead of CGPDFDocument, that API has text search operations, such as findString:withOptions

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文