阿帕奇 FOP,统一码。文本不可搜索
我遇到了一些渲染 PDF 文件的旧版 Java 代码的问题。
我们正在使用 Apache FOP:
Implementation-Title: Fop Implementation-Version: 0.20.5 Implementation-Vendor: Apache Software Foundation (http://xml.apache.org/fop/)
选项设置为:
<configuration>
<fonts>
<font metrics-file="arialuni.xml"
embed-file="ARIALUNI.TTF" kerning="yes">
<font-triplet name="arialuni" style="normal" weight="normal"/>
</font>
</fonts>
</configuration>
.pdf 已正确呈现,但有一个大问题: 我无法在此类文件中搜索文本,如果我尝试复制粘贴此文本 - 我会得到很多符号框。(□)
据我所知 - arialuni.ttf (unicode我想是 arial 的版本)导致了这个麻烦。有一些已知的解决方案吗?是否可以通过字体配置来解决这个问题?
提前致谢。
PS:我不允许切换到任何其他 pdf 渲染库,或升级现有的库。
编辑#1
谢谢大家的回答。我们现在可能会拒绝 Unicode 支持,稍后会升级到 1.0 版本。
I've got a problem with some legacy Java code that renders PDF files.
We're using Apache FOP:
Implementation-Title: Fop Implementation-Version: 0.20.5 Implementation-Vendor: Apache Software Foundation (http://xml.apache.org/fop/)
With options set to:
<configuration>
<fonts>
<font metrics-file="arialuni.xml"
embed-file="ARIALUNI.TTF" kerning="yes">
<font-triplet name="arialuni" style="normal" weight="normal"/>
</font>
</fonts>
</configuration>
The .pdf is rendered correctly, there is one big problem though:
I'm not able to search text in such file and if I'll try to copy-paste this text - I'll get a lot of symbols-boxes.(□)
As I've understood - the arialuni.ttf (unicode version of arial, i suppose) causes this troubles. Is there some known solutions? Is it possible to fix that with font configuration?
Thanks in advance.
PS: I'm not allowed to switch to any other pdf-rendering library, or upgrade an existing one.
Edit #1
Thank you all for your answers. We'll probably refuse from Unicode support for now and will upgrade to the 1.0 version later.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
最好的解决方案是给你的老板一巴掌,让他批准一些时间升级到 Apache FOP 1.0 或更高版本。严重地。
唯一的替代方法是在生成 XML 字体规格文件时尝试使用“-enc ansi”作为“TTFReader”应用程序的参数。这将导致 FOP 0.20.5 使用 WinAnsi 编码而不是 CID 编码。缺点:您将仅限于 WinAnsi 8 位编码。您无法获得整个 Unicode 集。
The best solution is to slap your boss and get him to approve some time for an upgrade to Apache FOP 1.0 or later. Seriously.
The only alternative is to try "-enc ansi" as parameter to the "TTFReader" application when you generate the XML font metrics file. That will cause FOP 0.20.5 to use WinAnsi encoding instead of CID encoding. The downside: you'll be restricted to the WinAnsi 8bit encoding. You don't get the whole Unicode set.
如果您不能使用“arial.ttf”,那么您几乎肯定会失败。该版本的 FOP 嵌入方式存在错误...
嘿!您的字体完全有可能根本没有嵌入:
来自 Apache FOP 字体< /a> page:
If you can't use "arial.ttf", then you're almost certainly doomed. There's a bug in the way that version of FOP is embedding...
HEY! It's entirely possible that your font isn't embedded at all:
From the Apache FOP fonts page: