阿帕奇 FOP，统一码。文本不可搜索

发布于 2024-10-29 06:14:15 字数 806 浏览 3 评论 0原文

我遇到了一些渲染 PDF 文件的旧版 Java 代码的问题。

我们正在使用 Apache FOP：

Implementation-Title: Fop
Implementation-Version: 0.20.5 
Implementation-Vendor: Apache Software Foundation (http://xml.apache.org/fop/)

选项设置为：

<configuration>
  <fonts>
   <font metrics-file="arialuni.xml" 
        embed-file="ARIALUNI.TTF" kerning="yes">            
    <font-triplet name="arialuni" style="normal" weight="normal"/>
  </font>
 </fonts>
</configuration>

.pdf 已正确呈现，但有一个大问题：我无法在此类文件中搜索文本，如果我尝试复制粘贴此文本 - 我会得到很多符号框。(□)

据我所知 - arialuni.ttf (unicode我想是 arial 的版本）导致了这个麻烦。有一些已知的解决方案吗？是否可以通过字体配置来解决这个问题？

提前致谢。

PS：我不允许切换到任何其他 pdf 渲染库，或升级现有的库。

编辑#1

谢谢大家的回答。我们现在可能会拒绝 Unicode 支持，稍后会升级到 1.0 版本。

原文

I've got a problem with some legacy Java code that renders PDF files.

We're using Apache FOP:

Implementation-Title: Fop
Implementation-Version: 0.20.5 
Implementation-Vendor: Apache Software Foundation (http://xml.apache.org/fop/)

With options set to:

<configuration>
  <fonts>
   <font metrics-file="arialuni.xml" 
        embed-file="ARIALUNI.TTF" kerning="yes">            
    <font-triplet name="arialuni" style="normal" weight="normal"/>
  </font>
 </fonts>
</configuration>

The .pdf is rendered correctly, there is one big problem though:
I'm not able to search text in such file and if I'll try to copy-paste this text - I'll get a lot of symbols-boxes.(□)

As I've understood - the arialuni.ttf (unicode version of arial, i suppose) causes this troubles. Is there some known solutions? Is it possible to fix that with font configuration?

Thanks in advance.

PS: I'm not allowed to switch to any other pdf-rendering library, or upgrade an existing one.

Edit #1

Thank you all for your answers. We'll probably refuse from Unicode support for now and will upgrade to the 1.0 version later.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

み青杉依旧 2024-11-05 06:14:15

最好的解决方案是给你的老板一巴掌，让他批准一些时间升级到 Apache FOP 1.0 或更高版本。严重地。

唯一的替代方法是在生成 XML 字体规格文件时尝试使用“-enc ansi”作为“TTFReader”应用程序的参数。这将导致 FOP 0.20.5 使用 WinAnsi 编码而不是 CID 编码。缺点：您将仅限于 WinAnsi 8 位编码。您无法获得整个 Unicode 集。

回复收藏 0 原文

初心 2024-11-05 06:14:15

PS：我不允许切换到任何其他 pdf 渲染库，或升级现有的库。

如果您不能使用“arial.ttf”，那么您几乎肯定会失败。该版本的 FOP 嵌入方式存在错误...

嘿！您的字体完全有可能根本没有嵌入：

来自 Apache FOP 字体< /a> page：

<fonts>
  <!-- register a particular font -->
  <font metrics-url="file:///C:/myfonts/FTL_____.xml" kerning="yes"
      embed-url="file:///C:/myfonts/FTL_____.pfb">
     <font-triplet name="FrutigerLight" style="normal" weight="normal"/>
  </font>
</fonts>

遗憾的是，我在这些文档中没有看到任何提及指定特定编码的内容。
您使用的是“embed-file”而不是“embed-url=”file:///”。我怀疑这是您的问题。

PS: I'm not allowed to switch to any other pdf-rendering library, or upgrade an existing one.

If you can't use "arial.ttf", then you're almost certainly doomed. There's a bug in the way that version of FOP is embedding...

HEY! It's entirely possible that your font isn't embedded at all:

From the Apache FOP fonts page:

<fonts>
  <!-- register a particular font -->
  <font metrics-url="file:///C:/myfonts/FTL_____.xml" kerning="yes"
      embed-url="file:///C:/myfonts/FTL_____.pfb">
     <font-triplet name="FrutigerLight" style="normal" weight="normal"/>
  </font>
</fonts>

Sadly, I don't see any mention of specifying a particular encoding anywhere in those docs.
You're using "embed-file" not "embed-url="file:///". I suspect that's your problem.

回复收藏 0 原文

~没有更多了~