I'm not sure about the best way to read PDFs from VBA directly, but if you can call an external Java or C# program, then I would recommend using iText for basic text extraction.
EDIT: I should maybe mention that Adobe's PDF reference is an 800 page beast. I found that it's good for looking up answers to particular questions (eg, storing widths of embedded truetype fonts), but it may not be a good place to start. For that, reading through the iText book helped me to get started on the format.
IText 书包含大量一般 PDF 任务的工作示例和大量背景信息,可帮助您理解 PDF 文件。它很快就能收回成本!
The IText book contains lots of worked examples for general PDF tasks and lots of background info to help you understand PDF files. It more than pays for itself very quickly!
发布评论
评论(2)
Adobe 的 PDF 参考在线:http://www.adobe.com/devnet/pdf/ pdf_reference.html
我不确定直接从 VBA 读取 PDF 的最佳方法,但如果您可以调用外部 Java 或 C# 程序,那么我建议使用 iText 用于基本文本提取。
编辑:我也许应该提到 Adobe 的 PDF 参考是一个 800 页的庞然大物。我发现它对于查找特定问题的答案很有用(例如,存储嵌入的 truetype 字体的宽度),但它可能不是一个好的起点。为此,通读 iText 书籍帮助我开始了解该格式。
Adobe's PDF reference is online here: http://www.adobe.com/devnet/pdf/pdf_reference.html
I'm not sure about the best way to read PDFs from VBA directly, but if you can call an external Java or C# program, then I would recommend using iText for basic text extraction.
EDIT: I should maybe mention that Adobe's PDF reference is an 800 page beast. I found that it's good for looking up answers to particular questions (eg, storing widths of embedded truetype fonts), but it may not be a good place to start. For that, reading through the iText book helped me to get started on the format.
IText 书包含大量一般 PDF 任务的工作示例和大量背景信息,可帮助您理解 PDF 文件。它很快就能收回成本!
The IText book contains lots of worked examples for general PDF tasks and lots of background info to help you understand PDF files. It more than pays for itself very quickly!