用pdfbox分割pdf,但丢失字体

发布于 2024-12-07 20:16:43 字数 879 浏览 2 评论 0原文

我使用 pdfbox API 用 Ja​​va 编写了一些代码,将 pdf 文档拆分为单独的页面,在页面中查找特定字符串,然后从包含该字符串的页面创建一个新的 pdf。我的问题是,当保存新页面时,我丢失了字体。我刚刚制作了一个快速的 Word 文档来测试它,默认字体是 calibri,所以当我运行该程序时,我收到一个错误框,上面写着:“无法提取嵌入的字体...”因此它用其他默认字体替换了该字体。

我看过很多示例代码,这些代码展示了如何在输入要放置在 pdf 中的文本时更改字体,但没有任何代码可以设置 pdf 的字体。

如果有人熟悉执行此操作的方法(或可以找到文档/示例),我将不胜感激!

编辑:忘记包含一些示例代码,

if (pageContent.indexOf(findThis) >= 0){
                PDPage pageToRip = pages.get(i);
                >>set the font of pageToRip here
                res.importPage(pageToRip); //res is the new document that will be saved
            }

我不知道这是否有帮助,但我想我会包含它。

另外,如果 pdf 是用 calibri 和 split 编写的,则更改如下所示:

left:calibri, right:what itchanges to

注意:这可能不是问题,它取决于需要处理的文件中使用的字体。除了 Calibri 之外,我还尝试了一些东西,效果很好。

I wrote some code in Java using the pdfbox API that splits a pdf document into it's individual pages, looks through the pages for a specific string, and then makes a new pdf from the page with the string on it. My problem is that when the new page is saved, I lose my font. I just made a quick word document to test it and the default font was calibri, so when I run the program I get an error box that reads: "Cannot extract the embedded font..." So it replaces the font with some other default.

I have seen a lot of example code that shows how to change the font when you are inputting text to be placed in the pdf, but nothing that sets the font for the pdf.

If anyone is familiar with a way to do this, (or can find documentation/examples), I would greatly appreciate it!

Edit: forgot to include some sample code

if (pageContent.indexOf(findThis) >= 0){
                PDPage pageToRip = pages.get(i);
                >>set the font of pageToRip here
                res.importPage(pageToRip); //res is the new document that will be saved
            }

I don't know if that helps any, but I figured I'd include it.

Also, this is what the change looks like if the pdf is written in calibri and split:

left:calibri, right:what it changes to

Note: This might be a nonissue, it depends on the font used in the files that will need to be processed. I tried some things besides Calibri and it worked out fine.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

冷夜 2024-12-14 20:16:43

来自如何从 PDF 中提取字体

实际上您无法从 PDF 中提取字体,即使该字体是
完全嵌入。这是不可行的原因有两个:

•大多数字体都受版权保护,因此使用提取器是非法的。

•当字体嵌入到 PDF 中时,并非所有字体数据都会被嵌入到 PDF 中。
包括。显然,字体轮廓数据也包括在内
字体宽度表。其他信息,例如有关连字的数据,
与 PDF 无关,因此这些数据不会包含在 PDF 中
PDF。我不知道有任何字体提取工具,但如果你来
综上所述,上述原因应该清楚地表明,这些
应避免使用公用设施。

From How to extract fonts from a PDF:

You actually cannot extract a font from a PDF, not even if the font is
fully embedded. There are two reasons why this is not feasible:

•Most fonts are copyrighted, making it illegal to use an extractor.

•When a font is embedded in a PDF, not all of the font data are
included. Obviously the font outline data are included as well as the
font width tables. Other information, such as data about ligatures,
are irrelevant within the PDF so those data do not get enclosed in a
PDF. I am not aware of any font extraction tools but if you come
across one, the above reasons should make it clear that these
utilities are to be avoided.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文