ITEXT和PDFBOX并未检测到PDF中存在的所有表单字段
在此代码中,我使用itext
和pdfbox
使用Java来查找PDF中的字段数,我正在附上PDF,它有11个字段,但是第1页中存在的字段未被检测到,并且打印的大小为2个。
PdfDocument doc = new PdfDocument(new PdfReader(file));
PdfAcroForm form = PdfAcroForm.getAcroForm(doc, true);
System.out.println("form fields size from Itext:"+form.getFormFields().size());
PDDocument document = PDDocument.load(file);
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
List<PDField> fields = acroForm.getFields();
System.out.println("form fields size from PDFBOX:"+fields.size());
In this code I've used for finding the number of fields in the pdf using Itext
and PDFBOX
with Java, I'm attaching the pdf, it has 11 fields but the fields present in the page 1 are not getting detected and the size being printed is 2 for the cases.
PdfDocument doc = new PdfDocument(new PdfReader(file));
PdfAcroForm form = PdfAcroForm.getAcroForm(doc, true);
System.out.println("form fields size from Itext:"+form.getFormFields().size());
PDDocument document = PDDocument.load(file);
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
List<PDField> fields = acroForm.getFields();
System.out.println("form fields size from PDFBOX:"+fields.size());
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的 PDF 中的表单信息不一致。
PDF 中的全局 AcroForm 表单定义仅包含 2 个字段:
文本字段 6
和文本字段 7
,这两个字段恰好是第二页。第一页的 Annots 数组引用了十个表单字段小部件,每个部件都与一个表单字段对象合并。这些字段未从 PDF 中的 AcroForm 表单定义中引用。因此,它们不是 PDF 形式的一部分,而只是一些随意的注释。
要解决此问题,只需从 AcroForm 表单定义中引用第一页小部件注释的表单字段即可。
The form information in your PDF is inconsistent.
The global AcroForm form definition in your PDF contains only 2 fields,
Text Field 6
andText Field 7
, which happen to be the two fields on page two.Page one in its Annots array references ten form field widgets, each of them merged with a form field object. These fields are not referenced from the AcroForm form definition in your PDF. Thus, they are not part of the form of the PDF but merely some arbitrary annotations hanging around.
To fix the issue, simply reference the form fields of the widget annotations of page one from the AcroForm form definition.