无法通过 Jpedal 解析 pdf
我在使用 Jpedal 解析 PDF 时遇到问题。
从 Jpedal 读取 wordlist
时,我在 wordslist
中收到乱码。当使用OCR时,以及当我从 PDF 复制文本并粘贴到 Word 或简单的文本编辑器中时,也会发生这种情况。据我了解,此 PDF 是由 MAC OS X 10.6.4 上的 Quartz PDF context 生成的,即用于压缩文件大小,但可以在 PDF 查看器上轻松查看。我搜索了任何支持解码此类 PDF 的 Java API,但没有成功。我正在寻找任何可以用来解码它的应用程序或Java API;必须可以在 Linux 机器上使用。
I'm facing a problem while parsing a PDF with Jpedal.
While reading the wordlist
from the Jpedal, I get garbled characters in the wordslist
. This also happens when using OCR, and when I copy the text from PDF and paste in Word or a simple text editor. What I understand is this PDF was generated by Quartz PDF context on MAC OS X 10.6.4, which is used to compress the file size, but iseasily viewable on PDF viewers. I searched for any Java API supporting for decoding this kind of PDF but was unsuccessful. I'm looking for any application or Java API which I can use to decode it; must be usable on a Linux machine.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
大家好,
我正在发布一个可能的问题解决方案。这是链接,描述了quartz如何解析pdf,当然这需要在代码中实现,因为到目前为止我还没有找到任何现成的API,我相信stackoverflow就是要采取主动,做并回答以前没有做过或问过的问题。
问候
里图拉吉
Hye everybody
I'm posting a possible solution for problem. Here is link describing how quartz parse the pdf and of course which need to be implemented in code cause till now I didn't found any readymade API for it and I believe that stackoverflow is all about taking initiative and do and answer the questions which not been done or asked before.
regards
Rituraj