以编程方式阅读、突出显示、保存 PDF
我想编写一个小脚本(将在无头 Linux 服务器上运行)来读取 PDF,突出显示与我传递的字符串数组中的任何内容相匹配的文本,然后保存修改后的 PDF。我想我最终会使用类似 python 绑定到 poppler 的东西,但不幸的是,文档几乎为零,并且我对 python 的经验几乎为零。
如果有人能给我指出教程、示例或一些有用的文档来帮助我入门,我将不胜感激!
I'd like to write a small script (which will run on a headless Linux server) that reads a PDF, highlights text that matches anything in an array of strings that I pass, then saves the modified PDF. I imagine I'll end up using something like the python bindings to poppler but unfortunately there's next to zero documentation and I have next to zero experience in python.
If anyone could point me to a tutorial, example, or some helpful documentation to get me started it would be greatly appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
是的,可以结合使用 pdfminer (
pip install pdfminer.six
) 和PyPDF2
。首先,找到坐标(例如这个)。然后突出显示它:
Yes, it is possible with a combination of pdfminer (
pip install pdfminer.six
) andPyPDF2
.First, find the coordinates (e.g. like this). Then highlight it:
您是否尝试过查看PDFMiner?听起来它就像你想要的那样。
Have you tried looking at PDFMiner? It sounds like it does what you want.
PDFlib 具有 Python 绑定并支持这些操作。如果您想打开 PDF,则需要使用 PDI。 http://www.pdflib.com/products/pdflib-family/pdflib- pdi/ 和 TET。
不幸的是,它是一个商业产品。我过去曾在生产中使用过这个库,并且效果很好。这些绑定非常实用,但 Python 则不然。我已经看到了一些让它们更加 Pythonic 的尝试: https://github.com/alexhayes/pythonic-pdflib< /a> 您将需要使用:open_pdi_document()。
听起来您想要进行某种搜索突出显示:
http://www.pdflib.com/tet-cookbook/tet-and-pdflib/highlight-search-terms/
PDFlib has Python bindings and supports these operations. You will want with PDI if you want to open a PDF. http://www.pdflib.com/products/pdflib-family/pdflib-pdi/ and TET.
Unfortunately, it is a commercial product. I have used this library in production in the past and it works great. The bindings are very functional and not so Python. I have seen some attempts to make them more Pythonic: https://github.com/alexhayes/pythonic-pdflib You will want to use: open_pdi_document().
It sounds like you will want to do search highlighting of some sort:
http://www.pdflib.com/tet-cookbook/tet-and-pdflib/highlight-search-terms/