通过PHP解析PDF/DOC银行对账单
我正在开发一个会计应用程序。用户将在应用程序中上传所需的 pdf 或 doc 银行对账单。我需要读取/解析文档并在数据库中插入金额/支票号码等(根据我的数据库结构)。
请帮助实现同样的目标。
I am working on a accouting application. The user will upload the desired pdf or doc bank statement in the application. I need to read/parse the document and insert the amount/cheque number etc...(according to my database structure) in the database.
Please help in achieving the same.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
PDF 是为了表示而制作的,而不是为了处理内部数据。
您可能会幸运地使用
pdftotext
或catdoc
。PDF is made for representation, not to work with the data inside.
You might be lucky with
pdftotext
orcatdoc
.我已经在同一个问题上工作了两个多星期了,我不得不说这是一项艰巨的任务。我已经成功地找到了一个 php 类来提取文本,但问题是它不适用于它所遇到的 .pdf 格式的每个版本。自己敲鼓需要花一些时间来解决编码和压缩问题。现在我实际上正在研究一些 python 库。现在从头开始写其中一篇对我来说太耗时了。
I've been working on this same issue for over 2 weeks now and I have to say it is quite a task. I have had some success finding a php class to extract the text , but the problem is it will not work on every version of the .pdf format it's hit and miss. And drumming one up yourself will take awhile figuring out the encoding and compression issues. Right now I'm actually looking at some python libraries. It's just too time consuming for me to write one of these up from scratch for now.