将 PDF 文件中的图层提取为 HTML
我有一个 PDF 文件,包含图层。
例如,在某些页面上有图表,当单击(图层)时,附加数据会显示在该图表的顶部。
现在我需要尝试从 PDF 文件中提取所有这些图层,或者准确地说,我需要该 PDF 文件中的所有数据,包括图层。 pdf 文件包含 javascript,可在适当时显示/隐藏图层。
最好的方法是什么?有没有真正适合我的意图的工具?还是我应该自己写点东西? (当然如果这是可能的话)。
编辑:
您可以在这里下载PDF文件: http://www.2shared.com/document/IutUfDfr/OR_erasmus.html
查看密码为:erasmus
I have a PDF file, containing layers.
For example, on some pages, there are graphs, with additional data displayed on top of that graph, when clicking (layers).
Now I need to try to fetch all these layers out of the PDF file, or to be precise, I need ALL the data from that PDF file, including layers. The pdf file contains javascript to show/hide the layers when appropriate.
What is the best approach? Is there any tool that actually works for my intentions? Or should I write something myself? (If this is possible ofcourse).
Edit:
Here you can download the PDF file:
http://www.2shared.com/document/IutUfDfr/OR_erasmus.html
The password for viewing is: erasmus
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不知道本身是否有任何工具,但如果您找不到这些工具,您可以执行以下操作:
现在您将拥有一组没有图层的 PDF 文件(可选内容),有很多工具可以渲染为 HTML 等。
注意:可选内容 <--> PDF 查看器中的层切换通常为 1:1,但该标准支持完整的 n:m 映射。我将专注于可以打开/关闭的真正可选内容块,以使事情变得简单。
I do not know if there are any tools per se but if you cannot find those you might do the following:
Now you will have a set of PDF files without layers (optional content) for which there are plenty tools to render to HTML etc.
Note: optional content <--> layer switches in the PDF viewer usually are 1:1 but the standard supports a full n:m mapping. I would concentrate on the real optional content blocks that can be turned on/off to keep things simple.
您可以使用此工具从锁定的 pdf 中提取图像和文本
http://download .cnet.com/Able2Extract/3000-2079_4-10249654.html
我有时自己使用它,它有能力转换为 HTML
you can use this tool to extract images and text from even locked pdfs
http://download.cnet.com/Able2Extract/3000-2079_4-10249654.html
I use it myself sometimes and it has the ability to convert to HTML