使用某些实用程序或脚本将嵌入的 PDF 字体提取到外部 ttf 文件

发布于 2024-08-14 16:48:10 字数 382 浏览 2 评论 0原文

是否可以使用某些实用程序或脚本将 PDF 文件中嵌入的字体提取到外部 ttf 文件?

  1. 系统中是否存在嵌入(或未嵌入)PDF 文件的字体。使用 swftools 中的 pdf2swf 和 swfextract 工具,我能够确定 PDF 文件中使用的字体名称。然后我可以在运行时编译相应的系统字体,然后加载到我的 AIR 应用程序。

  2. 但是如果系统中不存在 PDF 中使用的字体,则有两种可能性:

    2.1。如果PDF文件中也没有它们(未嵌入),我们只能根据字体名称使用类似的系统字体。

    2.2。如果它们嵌入在 PDF 文件中,那么我想知道是否有可能将它们提取到外部 ttf 文件,以便我可以在运行时将它们编译为单独的 swf 文件?

Is it possible to extract fonts that are embedded in a PDF file to an external ttf file using some utility or script?

  1. If the fonts that are embedded (or not embedded) to a PDF file are present in system. Using pdf2swf and swfextract tools from swftools I am able to determine names of the fonts used in a PDF file. Then I can compile respective system font(s) at run-time and then load to my AIR application.

  2. BUT if the fonts used in the PDF are absent in the system there are two possibilities:

    2.1. If they are absent in the PDF files as well (not embedded), we can only use similar system font basing on the font name.

    2.2. If they are embedded in the PDF file, then I want to know is it possible at all to extract them to external ttf file so that I can compile each of them to separate swf files at run-time?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

罪歌 2024-08-21 16:48:10

我知道你问这个问题已经有一段时间了,但我想我也许能帮上忙。

我不知道是否有任何实用程序允许您提取字体文件,但您可以手动执行此操作。

基本上,PDF 文件是具有不同对象的文本文件。您可以使用任何文本编辑器打开它并查找字体。

字体在 FontDescriptor 对象中指定,例如:

<</Type/FontDescriptor/FontName/ABCDEE+Algerian ... /FontFile2 24 0 R>>

这基本上是说,在对象 24 上指定了名为 Algerian 的字体。您可以使用“24 0 obj”行在文档中搜索对象 24,在此行之后,它显示带有字体文件的流的属性,并在“stream”关键字之后开始(其长度在 obj 之后的行中定义)。

该流包含压缩的 ttf 文件,要解压缩它,您可以使用以下方法:

  private static byte[] DecodeFlateDecodeData(byte[] data)
  {
     MemoryStream outputStream;
     using (outputStream = new MemoryStream())
     {
        using (var compressedDataStream = new MemoryStream(data))
        {
           // Remove the first two bytes to skip the header (it isn't recognized by the DeflateStream class)
           compressedDataStream.ReadByte();
           compressedDataStream.ReadByte();

           var deflateStream = new DeflateStream(compressedDataStream, CompressionMode.Decompress, true);

           var decompressedBuffer = new byte[1024];
           int read;
           while ((read = deflateStream.Read(decompressedBuffer, 0, decompressedBuffer.Length)) != 0)
           {
              outputStream.Write(decompressedBuffer, 0, read);
           }
           outputStream.Flush();
           compressedDataStream.Close();
        }
        return GetStreamBytes(outputStream);
     }
  }

我希望这可以帮助您...或帮助其他人

I know it's been a while since you asked this, but I figured I might be able to help.

I don't know if there is any utility that will allow you to extract the Font files, but you can do it manually.

Basically a PDF file is a text file with different objects. You can open it with any text editor and look for the fonts.

The fonts are specified in FontDescriptor objects, e.g:

<</Type/FontDescriptor/FontName/ABCDEE+Algerian ... /FontFile2 24 0 R>>

This basically says, a font with the name Algerian is specified on the object 24. You can search the document for the object 24 with the line "24 0 obj", after this line, it displays the properties of the stream with the font file and after the "stream" keyword it starts (its length is defined in the line after the obj).

This stream contains the ttf file, compressed, to decompress it you can use this method:

  private static byte[] DecodeFlateDecodeData(byte[] data)
  {
     MemoryStream outputStream;
     using (outputStream = new MemoryStream())
     {
        using (var compressedDataStream = new MemoryStream(data))
        {
           // Remove the first two bytes to skip the header (it isn't recognized by the DeflateStream class)
           compressedDataStream.ReadByte();
           compressedDataStream.ReadByte();

           var deflateStream = new DeflateStream(compressedDataStream, CompressionMode.Decompress, true);

           var decompressedBuffer = new byte[1024];
           int read;
           while ((read = deflateStream.Read(decompressedBuffer, 0, decompressedBuffer.Length)) != 0)
           {
              outputStream.Write(decompressedBuffer, 0, read);
           }
           outputStream.Flush();
           compressedDataStream.Close();
        }
        return GetStreamBytes(outputStream);
     }
  }

I hope this helps you... or helps somebody else

有木有妳兜一样 2024-08-21 16:48:10

这是一个迟到的答案,但我找到了一种使用免费 Windows 程序来做到这一点的方法。不需要脚本或编译或 cygwin。虽然只有几步,但并不像看起来那么糟糕。

  1. 安装mupdf
    链接 - http://mupdf.googlecode.com/files/mupdf-0.8 .15-windows.zip
    并将您的 pdf 复制到 mupdf 的安装文件夹中。假设它的名称为whatever.pdf。

  2. 打开 dos/命令提示符。导航到您的 mupdf 安装文件夹。
    例如:cd C:\Program Files\mupdf
    ...如果进展顺利,您的提示现在应该如下所示:C:\Program Files\mupdf>
    现在输入以下命令:
    pdfextractwhatever.pdf

然后,在 mupdf 程序文件夹中,您将拥有一个或多个字体文件。它们的名称类似于 ABCDEF+Fontname-12.cff ...目前它们采用不可用的 .cff 格式,但我们会修复该问题。我建议将其重命名为不那么尴尬的名称...例如whatever.cff

  1. 更多 DOS,抱歉。您需要一个名为 cftot1.exe 的工具。这是一个链接:
    ftp://tug.org/texlive/Contents/live/bin/win32 /cftot1.exe
    ...将其复制到您的 mupdf 文件夹中。然后输入:
    cfftot1whatever.cffwhatever.pfb

  2. 您现在有一个几乎可用的字体文件,名为whatever.pfb。我说“几乎”是因为通常 PFB 字体文件还附带第二个文件,即包含间距信息的 PFM 文件。如果没有这个文件,字体将无法安装,并且间距将被搞乱。但该字体仍会在 fontlab 等字体编辑器中打开。您可以将字体从那里保存为 TTF 或 OTF。您也可以尝试自己调整间距。

如果您没有字体编辑器,可以使用 crossfont。 Crossfont 可以获取 PFB 并生成必要的 PFM 文件,这样您至少可以安装和使用该字体。
链接 - http://crossfont.en.softonic.com/

就是这样。

It's a late answer but I found a way to do this using freely available windows programs. Won't require scripting or compiling or cygwin. It's a few steps but not as bad as it looks.

  1. Install mupdf
    link - http://mupdf.googlecode.com/files/mupdf-0.8.15-windows.zip
    and copy your pdf to mupdf's installation folder. Let's say it's called whatever.pdf.

  2. Open a dos/command prompt. Navigate to your mupdf install folder.
    example: cd C:\Program Files\mupdf
    ...If that goes smoothly, your prompt should now look like this: C:\Program Files\mupdf>
    Now type the following command:
    pdfextract whatever.pdf

Afterwards, within the mupdf program folder, you'll have one or more font files. They'll have names like ABCDEF+Fontname-12.cff ...Right now they're in the unusable .cff format but we'll fix that. I recommend renaming this to something less awkward... for example whatever.cff

  1. More DOS, sorry. You need a tool called cfftot1.exe. Here's a link:
    ftp://tug.org/texlive/Contents/live/bin/win32/cfftot1.exe
    ...Copy it to your mupdf folder. Then type this:
    cfftot1 whatever.cff whatever.pfb

  2. You now have an almost usable font file called whatever.pfb. I say 'almost' because usually PFB font files also come with a 2nd file, a PFM file which contains spacing information. Without this file the font won't install and the spacing will be screwed up. But the font will still open in font editors like fontlab. You can save the font from there to TTF or OTF. You can also try fixing the spacing yourself.

If you don't have a font editor, you can use crossfont. Crossfont can take the PFB and generate the necessary PFM file so you can at least install and use the font.
link - http://crossfont.en.softonic.com/

That's it.

感情废物 2024-08-21 16:48:10

几年前我设计了一种特殊的字体。我花了大约一年的时间上下班。有一天,我的迈拓硬盘坏了,我无法恢复我的工作。但我为我的客户将字体嵌入到了一些 PDF 文件中。然后我就有了从这些文件中提取字体的想法。在网上寻找答案一年左右后,我总结了一种从 PDF 中提取字体的方法。我已在我的博客 http://pdffontextract.blogspot.com 上介绍了此方法。自从我提出这个解决方案以来,出现了许多替代方案,但多样性并没有什么问题。我发表这篇文章是为了帮助其他需要恢复丢失工作的人。祝您玩得开心,如果您需要任何帮助,请随时与我联系。

A few years ago I have designed a special font. It took me about a year of on and off work. One day my Maxtor HDD died and there was no way I could recover my work. But I had the font embedded in some PDF files for my clients. Then I have the the ideea to extract fonts from these files. After a year or so of looking online for an answer I put together a method to extract fonts from PDF. I have presented this method on my blog at http://pdffontextract.blogspot.com . Since I have come up with this solution many alternetives emerged but there nothing wrong with diversity. I made this post to help other that need to recover their lost work. Have fun and if you need any help don't hesitate to contact me.

じ违心 2024-08-21 16:48:10

获取 cfftot1.exe 的链接已更改为 ftp://tug.org /texlive/Contents/live/bin/i386-linux/

The link to get the cfftot1.exe has changed to ftp://tug.org/texlive/Contents/live/bin/i386-linux/

酒绊 2024-08-21 16:48:10

小更新 - 某些 PDF 包含以另一种独特格式嵌入的字体,如 .CID 文件。
这种格式是为支持大量字符的字体(例如亚洲语言字体)而设计的,并且不会以典型的方式将字形映射到字母。

您仍然可以从 .CID 文件中获取可用的字体,您只需在我上面的答案中添加一个步骤即可。
通过名为 PStill (GPStill) 的程序运行 PDF。网站在这里:
http://www.wizards.de/~frank/pstill.html

选择时根据您的输入,将下拉列表从 Postscript 文件更改为 PDF 文件。
您的输出 PDF 将附加 _new。
如果您需要解锁 PDF,您可以使用 Elcomsoft 的高级 PDF 密码恢复。

此步骤的作用是将 PDF 中嵌入的 CID 字体转换为 PFA 类型 1 字体。因此,运行 PDFextract 后,您将拥有可以导入到 Fontlab 甚至 Crossfont 中的 .PFA 文件,而不是一堆无用的 .CID 文件。请注意,字母可能无法正确映射,因此您确实需要像 Fontlab 这样的工具来移动它们,以便例如在键盘上键入 A 不会导致字母 R。

一如既往,如果字体仅嵌入为一个子集,您将无法获得整个字体,而只能获得一组有限的字母。

Minor update - some PDFs contain fonts embedded in another unique format, as .CID files.
This format is made for fonts that support a lot of characters (ex. Asian language fonts) and don't map the glyphs to letters in a typical way.

You can still get usable fonts out of a .CID file, you just need to add a step to my answer above.
Run your PDF through a program called PStill (GPStill). The website is here:
http://www.wizards.de/~frank/pstill.html

When choosing your input, change the dropdown from Postscript File to PDF File.
Your output PDF will have _new appended to it.
If you need to unlock a PDF, you can use Advanced PDF Password Recovery from Elcomsoft.

What this step does is convert the CID fonts embedded in the PDF to PFA type 1 fonts. So after running PDFextract, instead of a bunch of useless .CID files, you have .PFA files that can be imported into Fontlab and possibly Crossfont. Be aware that the letters probably won't be mapped correctly, so you really want something like Fontlab to move them around so that e.g. typing A on your keyboard doesn't result in the letter R.

As always if the font was only embedded as a subset, you won't get the whole font, just a limited set of letters.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文