We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.
Closed 4 months ago.
The community reviewed whether to reopen this question 4 months ago and left it closed:
Not suitable for this site
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(8)
您有多种选择。所有这些方法都适用于 Linux 以及 Windows 或 Mac OS X。但是,请注意,大多数 PDF 在嵌入字体时并不包含完整的字体。大多数情况下,它们仅包含文档中使用的字形子集。
使用
pdftops
在 *nix 系统上执行此操作最常用的方法之一包括以下步骤:
pdftops
(在 Windows 上:pdftops.exe
现在.pfa
(PostScript) 格式嵌入,您可以使用文本编辑器提取它们。.pfa。使用
(ASCII) 转换为t1utils
和pfa2pfb
将.pfb
(二进制)文件。嵌入的 .pfm
或.afm
文件(字体公制文件)(因为 PDF 查看器具有有关这些的内部知识),字体文件很难以视觉上令人愉悦的方式使用 。 code>fontforge
另一种方法是使用免费字体编辑器FontForge :
检查 FontForge 手册。您可能需要遵循一些不一定简单的特定步骤,以便将提取的字体数据保存为可重复使用的文件。
使用
mupdf
接下来,MuPDF。该应用程序附带一个名为
pdfextract
的实用程序(在 Windows 上:pdfextract.exe
),可以从 PDF 中提取字体和图像。 (如果您不了解 MuPDF,它仍然相对未知且较新:“MuPDF 是一个用便携式 C 语言编写的免费轻量级 PDF 查看器和工具包。”,由 Artifex Software 开发人员编写,为我们提供 Ghostscript 的同一家公司。)(更新:新版本的 MuPDF 已将'pdfextract'的先前功能移至命令'mutool extract'。下载它此处:mupdf.com/downloads)
注意:
pdfextract.exe
是一个命令行程序。要使用它,请执行以下操作:此命令会将引用的 pdf 文件中的所有可提取文件转储到当前目录中。通常,您会看到各种文件:图像和字体。其中包括 PNG、TTF、CFF、CID 等。如果图像的 PDF 对象编号为 412,则图像名称将类似于 img-0412.png。字体名称将类似于 FGETYK +LinLibertineI-0966.ttf,如果字体的 PDF 对象编号为 966。CFF
(紧凑字体格式)文件是一种可识别的格式,可以通过各种转换器转换为其他格式用于不同的操作系统。
再次强调:请注意,大多数这些字体文件可能只有字符的子集,并且可能不代表完整的字体。
更新:(2013 年 7 月)
mupdf
的最新版本对其二进制文件进行了内部重组和重命名,不仅一次,而是多次。主要实用程序曾经是一个类似“瑞士刀”的二进制文件,名为mubusy
(名称灵感来自 busybox?),最近更名为mutool
。这些支持子命令 <code>info、clean
、extract
、poster
和show
。不幸的是,这些工具的官方文档还不是最新的。如果您在 Mac 上使用“MacPorts”:则该实用程序已重命名,以避免与使用相同名称的其他实用程序发生名称冲突,并且您可能需要使用mupdfextract
。要使用
mutool
获得与之前的工具pdfextract
相同的(大致)相同的结果,只需运行mubusy extract ...
.*即可提取字体和图像,您可能需要运行以下命令行之一:
下载地址:mupdf.com/downloads< /a>
使用
gs
(Ghostscript)然后,Ghostscript< /strong> 还可以直接从 PDF 中提取字体。但是,它需要名为
extractFonts.ps
,用 PostScript 语言编写,可从 Ghostscript 源代码存储库。现在使用它,您需要运行此文件
extractFonts.ps
和您的 PDF 文件。然后 Ghostscript 将使用 PostScript 程序中的指令从 PDF 中提取字体。它在 Windows 上看起来像这样(是的,Ghostscript 理解“正斜杠”,/,在 Windows 上也作为路径分隔符!):或者在 Linux、Unix 或 Mac OS X 上:
我几年前测试过 Ghostscript 方法。当时它确实可以很好地提取 *.ttf (TrueType)。我不知道其他字体类型是否也会被提取,如果是的话,以可重复使用的方式。我不知道该实用程序是否会阻止提取标记为受保护的字体。
使用
pdf-parser.py
最后,Didier Stevens 的 pdf -parser.py:这个可能不太容易使用,因为您需要对内部 PDF 结构有一些了解。
pdf-parser.py
是一个 Python 脚本,它还可以做很多其他事情。它还可以解压缩并从对象中提取任意流,因此它也可以提取嵌入的字体文件。但您需要知道要寻找什么。让我们看一个例子。我有一个名为 big.pdf 的文件。第一步,我使用
-s
参数在 PDF 中搜索任何出现的关键字 FontFile (pdf-parser.txt)。 py
不需要区分大小写的搜索):在我的例子中,对于我的 big1.pdf,我得到这个结果:
它告诉我有两个
FontFile2< 实例/code> 在 PDF 内,这些在 PDF 对象中。 15 和没有。分别为 16 个。对象编号15 保存字体 /ArialMT 的
/FontFile2
,对象编号。 16 保存字体 /Arial-BoldMT 的/FontFile2
。为了更清楚地显示这一点:
快速浏览一下 PDF 规范就会发现关键字
/FontFile2
与“包含 TrueType 字体程序的流” (/FontFile
将与“包含 Type 1 字体程序的流”相关,/FontFile3
将与“包含 1 类字体程序的流”相关,其格式为由流字典中的子类型条目指定' {因此是 Type1C 或 CIDFontType0C 子类型}。)具体查看 PDF 对象编号。 15(包含字体 /ArialMT),可以使用
-o 15
参数:此
pdf-parser.py< /code> 输出告诉我们该对象包含一个长度为 1.581.435 字节的流(它不会直接显示),并且使用 ASCIIHexEncode 进行编码(==“压缩”),并且需要解码(==“借助标准
/ASCIIHexDecode
过滤器进行解压缩”或“过滤”)。要从对象转储任何流,可以使用
-d dumpname
参数调用pdf-parser.py
。让我们开始吧:我们提取的数据转储将位于名为 dumped-data.ext 的文件中。让我们看看它有多大:
哦,看,它有 1.581.435 字节。我们在上一个命令的输出中看到了这个数字。使用文本编辑器打开此文件可确认其内容是 ASCII 十六进制编码数据。
使用
otfinfo
等字体读取工具打开文件(这是lcdf-typetools
包)一开始会导致一些失望:好的,这是因为我们(还)没有让
pdf-parser.py< /code> 充分利用它的魔力:转储经过过滤、解码的流。为此,我们必须添加
-f
参数:这个新文件的大小是多少?
哦,看:这个确切的数字也已经存储在 PDF 对象中了。 15 字典作为键
/Length1
的值...file
认为它是什么?otfinfo
告诉我们什么?所以 Bingo!,我们有一个获胜者:
pdf-parser.py
确实为我们提取了有效的字体文件。鉴于此文件的大小(778.552 字节),看起来此字体甚至已完全嵌入到 PDF 中...我们可以将其重命名为 arial-regular.ttf > 并按原样安装它并愉快地使用它。
注意事项:
在任何情况下,您都需要遵循适用于该字体的许可证。某些字体许可证不允许免费使用和/或分发。盗版字体就像盗版任何软件或其他受版权保护的材料。
大多数市面上的 PDF 并不嵌入完整的字体,而只是嵌入子集。提取字体子集仅在非常有限的范围内有用(如果有的话)。
另请阅读以下有关字体提取工作的优点和(更多)缺点:
You have several options. All these methods work on Linux as well as on Windows or Mac OS X. However, be aware that most PDFs do not include to full, complete fontface when they have a font embedded. Mostly they include just the subset of glyphs used in the document.
Using
pdftops
One of the most frequently used methods to do this on *nix systems consists of the following steps:
pdftops
(on Windows:pdftops.exe
helper program..pfa
(PostScript) format + you can extract them using a text editor..pfa
(ASCII) to a.pfb
(binary) file using thet1utils
andpfa2pfb
..pfm
or.afm
files (font metric files) embedded (because PDF viewer have internal knowledge about these). Without these, font files are hardly usable in a visually pleasing way.Using
fontforge
Another method is to use the Free font editor FontForge:
Check the FontForge manual. You may need to follow a few specific steps which are not necessarily straightforward in order to save the extracted font data as a file which is re-usable.
Using
mupdf
Next, MuPDF. This application comes with a utility called
pdfextract
(on Windows:pdfextract.exe
) which can extract fonts and images from PDFs. (In case you don't know about MuPDF, which still is relatively unknown and new: "MuPDF is a Free lightweight PDF viewer and toolkit written in portable C.", written by Artifex Software developers, the same company that gave us Ghostscript.)(Update: Newer versions of MuPDF have moved the former functionality of 'pdfextract' to the command 'mutool extract'. Download it here: mupdf.com/downloads)
Note:
pdfextract.exe
is a command-line program. To use it, do the following:This command will dump all of the extractable files from the pdf file referenced into the current directory. Generally you will see a variety of files: images as well as fonts. These include PNG, TTF, CFF, CID, etc. The image names will be like img-0412.png if the PDF object number of the image was 412. The fontnames will be like FGETYK+LinLibertineI-0966.ttf, if the font's PDF object number was 966.
CFF (Compact Font Format) files are a recognized format that can be converted to other formats via a variety of converters for use on different operating systems.
Again: be aware that most of these font files may have only a subset of characters and may not represent the complete typeface.
Update: (Jul 2013) Recent versions of
mupdf
have seen an internal reshuffling and renaming of their binaries, not just once, but several times. The main utility used to be a 'swiss knife'-alike binary calledmubusy
(name inspired by busybox?), which more recently was renamed tomutool
. These support the sub-commandsinfo
,clean
,extract
,poster
andshow
. Unfortunatey, the official documentation for these tools isn't up to date (yet). If you're on a Mac using 'MacPorts': then the utility was renamed in order to avoid name clashes with other utilities using identical names, and you may need to usemupdfextract
.To achieve the (roughly) equivalent results with
mutool
as its previous toolpdfextract
did, just runmubusy extract ...
.*So to extract fonts and images, you may need to run one of the following commandlines:
Downloads are here: mupdf.com/downloads
Using
gs
(Ghostscript)Then, Ghostscript can also extract fonts directly from PDFs. However, it needs the help of a special utility program named
extractFonts.ps
, written in PostScript language, which is available from the Ghostscript source code repository.Now use it, you need to run both, this file
extractFonts.ps
and your PDF file. Ghostscript will then use the instructions from the PostScript program to extract the fonts from the PDF. It looks like this on Windows (yes, Ghostscript understands the 'forward slash', /, as a path separator also on Windows!):or on Linux, Unix or Mac OS X:
I've tested the Ghostscript method a few years ago. At the time it did extract *.ttf (TrueType) just fine. I don't know if other font types will also be extracted at all, and if so, in a re-usable way. I don't know if the utility does block extracting of fonts which are marked as protected.
Using
pdf-parser.py
Finally, Didier Stevens' pdf-parser.py: this one is probably not as easy to use, because you need to have some know-how about internal PDF structures.
pdf-parser.py
is a Python script which can do a lot of other things too. It can also decompress and extract arbitrary streams from objects, and therefore it can extract embedded font files too.But you need to know what to look for. Let's see it with an example. I have a file named big.pdf. As a first step I use the
-s
parameter to search the PDF for any occurrence of the keyword FontFile (pdf-parser.py
does not require a case sensitive search):In my case, for my big1.pdf, I get this result:
It tells me that there are two instances of
FontFile2
inside the PDF, and these are in PDF objects no. 15 and no. 16, respectively. Object no. 15 holds the/FontFile2
for font /ArialMT, object no. 16 holds the/FontFile2
for font /Arial-BoldMT.To show this more clearly:
A quick peeking into the PDF specification reveals the the keyword
/FontFile2
relates to a 'stream containing a TrueType font program' (/FontFile
would relate to a 'stream containing a Type 1 font program' and/FontFile3
would relate to a 'stream containing a font program whose format is specified by the Subtype entry in the stream dictionary' {hence being either a Type1C or a CIDFontType0C subtype}.)To look specifically at PDF object no. 15 (which holds the font /ArialMT), one can use the
-o 15
parameter:This
pdf-parser.py
output tells us that this object contains a stream (which it will not directly display) that has a length of 1.581.435 Bytes and is encoded ( == "compressed") with ASCIIHexEncode and needs to be decoded ( == "de-compressed" or "filtered") with the help of the standard/ASCIIHexDecode
filter.To dump any stream from an object,
pdf-parser.py
can be called with the-d dumpname
parameter. Let's do it:Our extracted data dump will be in the file named dumped-data.ext. Let's see how big it is:
Oh look, it is 1.581.435 Bytes. We saw this figure in the previous command's output. Opening this file with a text editor confirms that its content is ASCII hex encoded data.
Opening the file with a font reading tool like
otfinfo
(this is a part of thelcdf-typetools
package) will lead to some disappointment at first:OK, this is because we did not (yet) let
pdf-parser.py
make use of its full magic: to dump a filtered, decoded stream. For this we have to add the-f
parameter:What's the size is this new file?
Oh, look: that exact number was also already stored in the PDF object no. 15 dictionary as the value for key
/Length1
...What does
file
think it is?What does
otfinfo
tell us about it?So Bingo!, we have a winner:
pdf-parser.py
did indeed extract a valid font file for us. Given the size of this file (778.552 Bytes), it looks like this font had been embedded even completely in the PDF...We could rename it to arial-regular.ttf and install it as such and happily make use of it.
Caveats:
In any case you need to follow the license that applies to the font. Some font licences do not allow free use and/or distribution. Pirating fonts is like pirating any software or other copyrighted material.
Most PDFs which are in the wild out there do not embed the full font anyway, but only subsets. Extracting a subset of a font is only useful in a very limited scope, if at all.
Please do also read the following about Pros and (more) Cons regarding font extraction efforts:
使用在线服务 http://www.extractpdf.com。无需安装任何东西。
Use online service http://www.extractpdf.com. No need to install anything.
尽管这个问题已经有 10 年历史了,但它仍然有效,并且随着技术的变化,有效的答案也会发生变化。
在搜索当前答案时,注意到没有人注意到 WOFF(网络开放字体格式)(W3C) (Wikipedia)可用于重新创建单个字符(字形)并将其显示在网络中准确翻页。
使用 IDR Solutions 提供的免费在线网页,PDF 转 HTML5(链接),将 PDF 转换为 zip 文件。生成的 zip 中将包含 woff 文件类型的字体目录。如果您不知道的话,当前的 Internet 浏览器支持 woff 文件。 (参考) 这些可以在在线网站 FontDrop! 上查看。 (链接)。
WOFF 文件可以在 WOFFer 与 OTF 或 TTF 相互转换 - WOFF 字体此外
,从 PDF 到 HTML5 的 zip 文件将包含 PDF 每一页的 HTML 文件,该文件可以在 Internet 浏览器中打开,是我发现或见过的最好、最准确的 PDF 翻译之一。
Even though this question is 10 years old, it is still valid and as technology changes so does a valid answer.
In searching the current answers noticed none of them note WOFF (Web Open Font Format) (W3C) (Wikipedia) which can be used to recreate the individual characters (glyphs) and display them in a web page accurately.
Using the free online web page by IDR Solutions, PDF to HTML5 (link), convert a PDF to a zip file. In the resulting zip will be a font directory of woff file types. Current Internet browsers support woff files if you were not aware. (reference) These can be examined at the online site FontDrop! (link).
WOFF files can be converted to/from OTF or TTF at WOFFer – WOFF font converter
Also the zip file from PDF to HTML5 will contain an HTML file for each page of the PDF that can be opened in an Internet browser and is one of the best and most accurate PDF translations I have found or seen.
最终找到FontForge Windows安装程序包并通过安装的程序打开PDF。辛苦了,很开心。
Eventually found the FontForge Windows installer package and opened the PDF through the installed program. Worked a treat, so happy.
http://www.verypdf.com/app /pdf-font-extractor/pdf-font-extracting-tool.html
IMO 提取字体的最简单方法(Windows)。
http://www.verypdf.com/app/pdf-font-extractor/pdf-font-extracting-tool.html
IMO easiest way to extract fonts (Windows).
PDF2SVG 版本 6.0 来自 PDFTron 做得不错。它默认生成 OpenType (
.otf
) 字体。使用--preserve_fontnames
保留“从源文件获取的字体/字体系列命名方案”。PDF2SVG 是一个商业产品,但您可以下载免费的演示可执行文件(其中包括 SVG 输出上的水印,但不限制使用)。可能还有其他 PDFTron 产品也可以提取字体,但我最近才发现 PDF2SVG。
PDF2SVG version 6.0 from PDFTron does a reasonable job. It produces OpenType (
.otf
) fonts by default. Use--preserve_fontnames
to preserve "the font/font-family naming scheme as obtained from the source file."PDF2SVG is a commercial product, but you can download a free demo executable (which includes watermarks on the SVG output but doesn't otherwise restrict usage). There may be other PDFTron products that also extract fonts, but I only recently discovered PDF2SVG myself.
目前可用于提取 pdf 字体的最佳在线工具之一是 http://www. pdfconvertonline.com/extract-pdf-fonts-online.html
One of the best online tools currently available to extract pdf fonts is http://www.pdfconvertonline.com/extract-pdf-fonts-online.html
这是 @Kurt Pfeifle 的回答 的
font-forge
部分的后续内容,具体针对 Red Hat(可能还有其他 Linux 发行版)。一旦获得 TTF 文件,您可以通过
/usr/share/fonts
(以 root 身份)This is a followup to the
font-forge
section of @Kurt Pfeifle's answer, specific to Red Hat (and possibly other Linux distros).Once you have your TTF file, you can install it on your system by
/usr/share/fonts
(as root)fc-cache -f /usr/share/fonts/
(as root)