如何确定 pdf 文档中单词的字体系列和字体大小?

发布于 2024-08-31 19:42:16 字数 107 浏览 5 评论 0原文

如何确定 pdf 文档中单词的字体系列和字体大小?我们实际上正在尝试使用 iText 以编程方式生成 pdf 文档,但我们不确定如何找出需要生成的原始文档的字体系列和字体大小。文档属性似乎不包含此信息

How do I figure out the font family and the font size of the words in a pdf document? We are actually trying to generate a pdf document programmatically using iText, but we are not sure how to find out the font family and the font size of the original document which needs to be generated. document properties doesn't seem to contain this information

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

回首观望 2024-09-07 19:42:16

字体存储在目录中(我想是在字体类型的子目录中)。如果您以文本文件形式打开 pdf,您应该能够找到目录条目(它们分别以“<<”和“>>”开头和结尾。

在一个简单的 pdf 文件中,我发现了以下内容:

<</Type/Font/BaseFont/Helvetica-Bold/Subtype/Type1/Encoding/WinAnsiEncoding>>

因此搜索前缀应该对您有帮助(在某些pdf文件中,pdf文件之间有空格)
组件,但“/Type /Font”应该没问题)。

当然,这是一个手动过程,而您可能更喜欢自动过程。

另一方面,我们有时会使用 identifont什么字体 查找给我们带来问题的不常见字体(徽标字体)。

问候
Guillaume

编辑:以下代码将查找页面中的所有字体。简而言之,您在每个页面的词典中搜索子词典“资源”,然后搜索子词典“字体”。后者中的每个条目都是一个字体字典,描述一种字体。

 PdfReader reader = new PdfReader(
   new FileInputStream(new File("file.pdf")));
 int nbmax = reader.getNumberOfPages();
 System.out.println("nb pages " + nbmax);

 for (int i = 1; i <= nbmax; i++) {
    System.out.println("----------------------------------------");
    System.out.println("Page " + i);
    PdfDictionary dico = reader.getPageN(i);
    PdfDictionary ressource = dico.getAsDict(PdfName.RESOURCES);
    PdfDictionary font = ressource.getAsDict(PdfName.FONT);
    // we got the page fonts
    Set keys = font.getKeys();
    Iterator it = keys.iterator();
    while (it.hasNext()) {
       PdfName name = (PdfName) it.next();
       PdfDictionary fontdict = font.getAsDict(name);
       PdfObject typeFont = fontdict.getDirectObject(PdfName.SUBTYPE);
       PdfObject baseFont = fontdict.getDirectObject(PdfName.BASEFONT);               
       System.out.println(baseFont.toString());              
    }
 }

名称(以下代码中的变量“name”)是文本中用于更改字体的名称。在 PDF 中,您必须在文本旁边找到它。下面的数字就是尺寸。例如,这里的尺寸是 12。(抱歉,这部分仍然没有代码)。

BT 
/F13  12  Tf 
288  720  Td 
the text to find  Tj 
ET

Fonts are stored in the catalog (I suppose in a sub-catalog of type font). If you open a pdf as a text file, you should be able to find catalog entries (they begin and end with "<<" and ">>" respectively.

On a simple pdf file, i found the following:

<</Type/Font/BaseFont/Helvetica-Bold/Subtype/Type1/Encoding/WinAnsiEncoding>>

thus searching for the prefix should help you (in some pdf files, there are spaces between
the commponents but '/Type /Font' should be ok).

Of course this is a manual process, while you would probably prefer an automatic one.

On another note, we sometime use identifont or what the font to find uncommon fonts that give us problem (logo font).

regards
Guillaume

Edit : the following code will find all font in the pages. To be short, you search the dictionnary of each page for the subdictionnary "ressource" and then the subdictionnary "font". Each entry in the later is a font dictionnary, describing a font.

 PdfReader reader = new PdfReader(
   new FileInputStream(new File("file.pdf")));
 int nbmax = reader.getNumberOfPages();
 System.out.println("nb pages " + nbmax);

 for (int i = 1; i <= nbmax; i++) {
    System.out.println("----------------------------------------");
    System.out.println("Page " + i);
    PdfDictionary dico = reader.getPageN(i);
    PdfDictionary ressource = dico.getAsDict(PdfName.RESOURCES);
    PdfDictionary font = ressource.getAsDict(PdfName.FONT);
    // we got the page fonts
    Set keys = font.getKeys();
    Iterator it = keys.iterator();
    while (it.hasNext()) {
       PdfName name = (PdfName) it.next();
       PdfDictionary fontdict = font.getAsDict(name);
       PdfObject typeFont = fontdict.getDirectObject(PdfName.SUBTYPE);
       PdfObject baseFont = fontdict.getDirectObject(PdfName.BASEFONT);               
       System.out.println(baseFont.toString());              
    }
 }

The name (variable "name" in the following code) is what is used in the text to change font. In the PDF, you'll have to find it next to a text. The following number is the size. Here for example, it's size 12. (sorry, still no code for this part).

BT 
/F13  12  Tf 
288  720  Td 
the text to find  Tj 
ET
时光倒影 2024-09-07 19:42:16

根据 PDF,如果尚未概述,您可以在 Adobe Illustrator 中打开它,双击文本并选择其中的一些内容以查看其字体系列、大小等。

如果文本被概述,然后使用 PATRY 建议的在线工具之一来查找字体。

祝你好运

Depending on the PDF, if it hasn't been outlined you may be able to open it in Adobe Illustrator, double click the text and select some of it to see it's font family, size, etc.

If the text is outlined then use one of those online tools that PATRY suggests to find out the font.

Good luck

梦冥 2024-09-07 19:42:16

如果您有 Adob​​e Acrobat,您可以看到里面的字体并检查对象和文本流。我在 http:// pdf.jpedal.org/java-pdf-blog/bid/10479/Viewing-PDF-objects

If you have Adobe Acrobat you can see the fonts inside and examine the objects and text streams. I wrote a blog post on this at http://pdf.jpedal.org/java-pdf-blog/bid/10479/Viewing-PDF-objects

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文