自动提取pdf中突出显示的内容作为图像
我有一个 pdf 文件,其中使用突出显示文本(U)工具突出显示了一些文本和图像。有没有办法自动将所有突出显示的内容提取为单独的图像并将其保存到文件夹中?我不需要可读的文本。我只想将所有突出显示的内容作为图像。谢谢
I have a pdf file in which some text and images are highlighted using highlight text(U) tool. Is there a way to automatically extract all the highlighted content as separate images and save it to a folder? I dont want readable text. I just want all the highlighted content as images. Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要使用 PDF 库来迭代所有 Annotation 对象及其属性,以查看哪些对象正在使用突出显示注释。找到突出显示注释后,您可以提取注释的位置和大小(边界框)。
获得注释边界框列表后,您需要将 PDF 文件渲染为 PNG/JPEG/TIFF 等图像格式,以便您可以提取/剪辑所需注释文本的渲染图像。您可以使用 GDI+ 或 LibTIFF 之类的东西
有多种 PDF 库可以执行此操作,包括
http://www.quickpdflibrary.com(我咨询QuickPDF)或
http://www.itextpdf.com
这是一个基于 Quick PDF Library 的 C# 函数,可以满足您的需要。
You would need to use PDF library to iterate through all the Annotation objects and their properties to see which ones are using a highlight annotation. Once you have found the highlight annotation you can then extract the position and size (bounding box) of the annotation.
Once you have a list of the annotation bounding boxes you will need to render the PDF file to an image format such as PNG/JPEG/TIFF so that you can extract / clip the rendered image of the annotation text you want. You could use GDI+ or something like LibTIFF
There are various PDF libraries that could do this including
http://www.quickpdflibrary.com (I consult for QuickPDF) or
http://www.itextpdf.com
Here is a C# function based on Quick PDF Library that does what you need.
您想要将每段文本作为单独的突出显示,还是将所有突出显示放在单独的窗格上?
Do you want each piece of text as a separate highlight or all the higlhights on a separate pane?