访问 PDF 中的字体文件
我们目前正在与一些出版商合作,从他们的 PDF 生成在线图书。我们的旧版应用程序使用 Flex,因此为此我们使用 PDF2SWF 将 PDF 转换为 SWF 文件 SWFTools。
我们遇到的问题是,当用户执行搜索时,我们的 Flex 阅读器不会突出显示 SWF 文档中的文本。经过快速调查,我们发现在提取文本时,我们需要嵌入 PDF 文档使用的字体:
http:// wiki.swftools.org/wiki/How_do_I_highlight_text_in_the_SWF%3F
pdf2swf -F $YOUR_FONTS_DIR$ -f input.pdf -o output.swf
从上面的代码中可以看出,我们需要一个指向字体目录包含在该 PDF 中找到的字体。
由于我们将转换大量 PDF,是否可以直接通过 PDF 访问字体文件,而不是在我们的应用程序中存储大量字体?
<我> 其他信息
我们的应用程序是用 Java 编写的。
我们目前在应用程序中使用 PDFBox 和 Ghostscript,因此如果有任何解决方案使用这些库,那么这将是首选,但我们对所有想法持开放态度。
We are currently working with a selection of publishers to generate online books from their PDF's. Our legacy app uses flex, so for this we are converting the PDF to SWF files using PDF2SWF by SWFTools.
The problem that we are having is that the text within the SWF document is not being highlighted by our flex reader when the user performs a search. After a quick investigation we found that when extracting text we need to embed the fonts that are used by the PDF document:
http://wiki.swftools.org/wiki/How_do_I_highlight_text_in_the_SWF%3F
pdf2swf -F $YOUR_FONTS_DIR$ -f input.pdf -o output.swf
As you can see from the code above, we need a path to a font directory containig the fonts found within that PDF.
Since we will be converting a large number of PDF's, is it possible to access the font files directly through the PDF rather than having a lot of fonts stored within our app?
Additional Information
Our app is written in Java.
We are currently using PDFBox and Ghostscript within the app, so if any solutions use these libraries than that would be a preferred option, but we are open to all ideas.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
PDF 文件不包含字体“文件”,它们甚至可能根本不包含任何字体,尽管这种情况很少见。嵌入字体数据的格式多种多样:
您的应用程序能够读取所有这些字体格式吗?如果您想使用它们,那么您必须使用 PDF 文件中嵌入的字体,因为这些字体通常是子集字体,并提供自定义编码,这意味着即使您拥有原始字体,您不能使用它,因为编码不正确。
当然,可能这些PDF文件都是以一致的方式创建的,并且不使用嵌入字体,但我有我的怀疑......
PDF files don't contain font 'files' they may not even contain any fonts at all, though this is rare. The embedded font data can be in a bewildering variety of formats:
Will your application be able to read all these font formats ? If you want to use them then you must use the fonts embedded in the PDF file as these will very often be subset fonts, and supplied with a custom Encoding, which means that even if you have the original font, you can't use it because the Encoding will not be correct.
Of course it may be that these PDF files are all created in a consistent way and do not use embedded fonts, but I have my doubts....