PDFBox 无法识别链接
我正在使用 Apache PDFBox 扫描 PDF 以搜索特定文件的链接。
我有大约一千个 PDF 需要扫描,并且大多数链接(事实上,据我现在所见,除了一个之外的所有链接)都已找到。
然而,PDFBox 会忽略 PDF 中的一个特定链接。如果我用 Foxit 打开 PDF 并检查链接的属性,它看起来与所有其他链接(确实找到)完全相同。
以下是我用来迭代链接的代码:
for( Object p : pages ) {
PDPage page = (PDPage)p;
List<?> annotations = page.getAnnotations();
for( Object a : annotations ) {
PDAnnotation annotation = (PDAnnotation)a;
if( annotation instanceof PDAnnotationLink ) {
PDAnnotationLink link = (PDAnnotationLink)annotation;
/* Do stuff with the link */
}
}
}
在受影响的 PDF 中,page.getAnnotations()
确实返回一个空列表。
除了我应该注意的注释之外,还有其他类型的链接吗?
I'm using Apache PDFBox to scan through a PDF in search of links to a certain file.
I've got about a thousand PDF's to scan, and most of the links (in fact all but one as far as I can see now) are found.
However, there is one particular link in a PDF that PDFBox simply ignores. If I open the PDF with Foxit and check the link's properties, it looks exactly like all the other links (that do get found).
Here's the code I use to iterate through the links:
for( Object p : pages ) {
PDPage page = (PDPage)p;
List<?> annotations = page.getAnnotations();
for( Object a : annotations ) {
PDAnnotation annotation = (PDAnnotation)a;
if( annotation instanceof PDAnnotationLink ) {
PDAnnotationLink link = (PDAnnotationLink)annotation;
/* Do stuff with the link */
}
}
}
In the affected PDF, page.getAnnotations()
does return an empty list.
Is there any other type of link besides the annotations that I should be aware of?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我查了一下annot词典。看起来像这样:
我看不出有什么问题。页面中的注释条目也正确引用了它。抱歉,我无法提供更多帮助。
I took a look at the annot dictionary. It looks like this:
I can't see anything wrong with it. It is also referenced correctly from the Annots entry in the page. Sorry I cannot be of more help.