解析pdf文件时从字段中没有获得任何字段
我正在尝试解析PDF文件。我想在复选框值的列表或字典中获取所有值。但是我遇到了这个错误。
“返回orderddict(((k,v.get('/v'''))for fields.items()) attributeError:'nontype'对象没有属性'项目'
我正在尝试的代码是
from collections import OrderedDict
from PyPDF2 import PdfFileWriter, PdfFileReader
def _getFields(obj, tree=None, retval=None, fileobj=None):
fieldAttributes = {'/FT': 'Field Type', '/Parent': 'Parent', '/T': 'Field Name', '/TU': 'Alternate Field Name',
'/TM': 'Mapping Name', '/Ff': 'Field Flags', '/V': 'Value', '/DV': 'Default Value'}
if retval is None:
retval = OrderedDict()
catalog = obj.trailer["/Root"]
# get the AcroForm tree
if "/AcroForm" in catalog:
tree = catalog["/AcroForm"]
else:
return None
if tree is None:
return retval
obj._checkKids(tree, retval, fileobj)
for attr in fieldAttributes:
if attr in tree:
# Tree is a field
obj._buildField(tree, retval, fileobj, fieldAttributes)
break
if "/Fields" in tree:
fields = tree["/Fields"]
for f in fields:
field = f.getObject()
obj._buildField(field, retval, fileobj, fieldAttributes)
return retval
def get_form_fields(infile):
infile = PdfFileReader(open(infile, 'rb'))
fields = _getFields(infile)
return OrderedDict((k, v.get('/V', '')) for k, v in fields.items())
if __name__ == '__main__':
from pprint import pprint
pdf_file_name = 'Guild.pdf'
pprint(get_form_fields(pdf_file_name))
I am trying to parse a pdf file. I want to get all the values in a list or dictionary of the checkbox values. But I am getting this error.
"return OrderedDict((k, v.get('/V', '')) for k, v in fields.items())
AttributeError: 'NoneType' object has no attribute 'items'"
The code I am trying is this
from collections import OrderedDict
from PyPDF2 import PdfFileWriter, PdfFileReader
def _getFields(obj, tree=None, retval=None, fileobj=None):
fieldAttributes = {'/FT': 'Field Type', '/Parent': 'Parent', '/T': 'Field Name', '/TU': 'Alternate Field Name',
'/TM': 'Mapping Name', '/Ff': 'Field Flags', '/V': 'Value', '/DV': 'Default Value'}
if retval is None:
retval = OrderedDict()
catalog = obj.trailer["/Root"]
# get the AcroForm tree
if "/AcroForm" in catalog:
tree = catalog["/AcroForm"]
else:
return None
if tree is None:
return retval
obj._checkKids(tree, retval, fileobj)
for attr in fieldAttributes:
if attr in tree:
# Tree is a field
obj._buildField(tree, retval, fileobj, fieldAttributes)
break
if "/Fields" in tree:
fields = tree["/Fields"]
for f in fields:
field = f.getObject()
obj._buildField(field, retval, fileobj, fieldAttributes)
return retval
def get_form_fields(infile):
infile = PdfFileReader(open(infile, 'rb'))
fields = _getFields(infile)
return OrderedDict((k, v.get('/V', '')) for k, v in fields.items())
if __name__ == '__main__':
from pprint import pprint
pdf_file_name = 'Guild.pdf'
pprint(get_form_fields(pdf_file_name))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
跟踪您的代码后,在第10行上,似乎
目录
存储值{'/metadata':indirectObject(16,0),'/pages':indirectObject(1,0) ),'/type':'/catalog'}
,含义/acroform
不是字典中的键,您的函数返回none
。After tracing through your code, on the 10th line it seems that
catalog
stores the value{'/Metadata': IndirectObject(16, 0), '/Pages': IndirectObject(1, 0), '/Type': '/Catalog'}
, meaning/AcroForm
is not a key in the dictionary and your function returnsNone
.您的
_getFields
明确返回none
从第一个 if block中返回。因此,基本上,这就是您可以从中获得此错误的地方。
Your
_getFields
explicitly returnsNone
from firstif
block.So basically that's where you could get this error from.