列出Word文档使用的字体(更快的方法)

发布于 2024-10-21 13:29:08 字数 1604 浏览 3 评论 0原文

我正在制定一个验证文件的流程,以确保它们符合公司标准。步骤之一是确保 Word 文档不使用未经批准的字体。

我有以下代码存根,它可以工作:

    Dim wordApplication As Word.ApplicationClass = New Word.ApplicationClass()
    Dim wordDocument As Word.Document = Nothing

    Dim fontList As New List(Of String)()

    Try
        wordDocument = wordApplication.Documents.Open(FileName:="document Path")
        'I've also tried using a for loop with an integer counter, no change in speed'
        For Each c As Word.Range In wordDocument.Characters
            If Not fontList.Contains(c.Font.Name) Then
                fontList.Add(c.Font.Name)
            End If
        Next

但这令人难以置信慢!慢得令人难以置信= 2500 个字符/分钟(我用秒表计时)。我的大部分文件大约有 6,000 个单词/30,000 个字符(大约 25 页)。但有些文档长达 100 页......

有没有更快的方法来做到这一点?我必须支持 Office 2003 格式文件,因此不能选择 Open XML SDK。

--更新--

我尝试将其作为 Word 宏运行(使用在 http://word.tips.net/Pages/T001522_Creating_a_Document_Font_List.html)并且运行速度更快(不到一分钟)。不幸的是,就我的目的而言,我不相信宏会起作用。

--更新 #2--

我采纳了 Chris 的建议,并将文档即时转换为 Open XML 格式。然后,我使用以下代码查找所有 RunFonts 对象并读取字体名称:

    Using docP As WordprocessingDocument = WordprocessingDocument.Open(tmpPath, False)
        Dim runFonts = docP.MainDocumentPart.Document.Descendants(Of RunFonts)().Select(
                            Function(c) If(c.Ascii.HasValue, c.Ascii.InnerText, String.Empty)).Distinct().ToList()

        fontList.AddRange(runFonts)
    End Using

I am working on a process for validating documents to make sure that they meet corporate standards. One of the steps is to make sure that the Word document does not use non-approved fonts.

I have the following stub of code, which works:

    Dim wordApplication As Word.ApplicationClass = New Word.ApplicationClass()
    Dim wordDocument As Word.Document = Nothing

    Dim fontList As New List(Of String)()

    Try
        wordDocument = wordApplication.Documents.Open(FileName:="document Path")
        'I've also tried using a for loop with an integer counter, no change in speed'
        For Each c As Word.Range In wordDocument.Characters
            If Not fontList.Contains(c.Font.Name) Then
                fontList.Add(c.Font.Name)
            End If
        Next

But this is incredibly slow! Incredibly slow = 2500 characters/minute (I timed it with StopWatch). Most of my files are around 6,000 words/30,000 characters (about 25 pages). But there are some documents that are in the 100's of pages...

Is there a faster way of doing this? I have to support Office 2003 format files, so the Open XML SDK isn't an option.

--UPDATE--

I tried running this as a Word macro (using the code found @ http://word.tips.net/Pages/T001522_Creating_a_Document_Font_List.html) and it runs much faster (under a minute). Unfortunately for my purposes I don't believe a Macro will work.

--UPDATE #2--

I took Chris's advice and converted the document to Open XML format on the fly. I then used the following code to find all RunFonts objects and read the font name:

    Using docP As WordprocessingDocument = WordprocessingDocument.Open(tmpPath, False)
        Dim runFonts = docP.MainDocumentPart.Document.Descendants(Of RunFonts)().Select(
                            Function(c) If(c.Ascii.HasValue, c.Ascii.InnerText, String.Empty)).Distinct().ToList()

        fontList.AddRange(runFonts)
    End Using

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

空心↖ 2024-10-28 13:29:08

您可能必须支持 Office 2003,但这并不意味着您必须以该格式解析它。获取 Office 2003 文档,将其临时转换为 DOCX 文件,将其作为 ZIP 文件打开,解析 /word/fontTable.xml 文件,然后删除 DOCX。

You might have to support Office 2003 but that doesn't mean you have to parse it in that format. Take the Office 2003 documents, temporarily convert them to DOCX files, open that as a ZIP file, parse the /word/fontTable.xml file and then delete the DOCX.

颜漓半夏 2024-10-28 13:29:08

我发现无需编码的另一种方法是:

  • 将文档导出为 PDF
  • 在 adobe reader中打开它
  • 转到:文件菜单\属性,然后字体选项卡,其中列出了文档中使用的系列字体和子字体。

甚至也许开发人员和程序员可以使用此过程对其进行编码并取出 PDF 字体列表,以对更多人有用。

Another way I found without coding is :

  • export document as PDF
  • open it in adobe reader
  • in adobe reader go to : file menu\properties and then fonts tab, which lists the family fonts and sub-fonts those used in document.

Even maybe Developers and Programmers could use this procedure to code it and take out PDF Font list to what could be useful for more people.

野味少女 2024-10-28 13:29:08

通过迭代段落可以大大加快速度。仅当段落包含混合字体时,您才需要逐字符检查。名称、粗体、斜体等属性具有特殊的“不确定”值,对于名称为空字符串,对于样式属性为 9999999。

因此,例如,如果 Bold = 9999999,则表示该段落包含一些粗体字符和一些非粗体字符。

我包括以下片段来展示总体思路:

For Each P as Paragraph in doc.Paragraphs
    Dim R as Range = P.Range
    If R.Font.Name = "" Or R.Font.Bold = 9999999 Or R.Font.Italic = 9999999
        Or R.Font.Underline = 9999999 Or R.Font.Strikethrough = 9999999 Then
        ' This paragraph uses mixed fonts, so we need to analyse character by character
        AnalyseCharacterByCharacter(R)
    Else
        ' R.Font is used throughout this paragraph
        FontHasBeenUsed(R.Font)
    End If
 Next

You can speed things up a lot by iterating over paragraphs. Only if a paragraph contains mixed fonts would you need to check character by character. The Name, Bold, Italic, etc. properties have a special "indeterminate" value, which is an empty string for the Name or 9999999 for the style attributes.

So, for example, if Bold = 9999999 it means the paragraph contains some bold and some non-bold characters.

I include the following fragment to show the general idea:

For Each P as Paragraph in doc.Paragraphs
    Dim R as Range = P.Range
    If R.Font.Name = "" Or R.Font.Bold = 9999999 Or R.Font.Italic = 9999999
        Or R.Font.Underline = 9999999 Or R.Font.Strikethrough = 9999999 Then
        ' This paragraph uses mixed fonts, so we need to analyse character by character
        AnalyseCharacterByCharacter(R)
    Else
        ' R.Font is used throughout this paragraph
        FontHasBeenUsed(R.Font)
    End If
 Next
请你别敷衍 2024-10-28 13:29:08

我认为这是错误的做法。我们正在寻找字体包含的事实,而不是该字体的位置。这是一个存在问题,而不是一个位置问题。

迭代字体要快很多很多。唯一的窍门是Word有时对空格等很挑剔。这对我来说效果很好

Sub FindAllFonts()
    Dim lWhichFont As Long, sTempName As String, sBuffer As String
    For lWhichFont = 1 To FontNames.Count
       sTempName = FontNames(lWhichFont)
       If FindThisFont(sTempName) Then
          sBuffer = sBuffer & "Found " & sTempName & vbCrLf
        Else
           If FindThisFont(Replace(sTempName, " ", "")) Then
              sBuffer = sBuffer & "Found " & sTempName & " *" & vbCrLf
           End If
        End If
   Next
   Documents.Add
   Selection.TypeText Text:=sBuffer
End Sub

Function FindThisFont(sName As String) As Boolean
   Selection.HomeKey Unit:=wdStory
   Selection.Find.ClearFormatting
   With Selection.Find
       .Font.Name = sName
       .Forward = True
       .Format = True
       .Execute
       If .Found() Then
          FindThisFont = True
       Else
          FindThisFont = False
       End If
   End With
End Function

它工作得非常快(唯一慢的组件是字体迭代)

(显然,它不会找到不在您的系统上的字体,但如果您正在尝试准备传输您编写的内容和一些辅助程序已经把 Helvetica 或 MS Minchin 放进去了,你可以找到它)

好吧,人们告诉我这不是每个人都想要的,人们想要找到他们机器上没有的字体。但另一种方法仍然太慢并且需要寻找很多不存在的东西。所以这里有一个替代方案,将其保存为 rtf,并处理 rtf 标头。

Sub FIndAllFonts2()
    Dim sOldName As String, sOldPath As String, sBuffer As String, sBuffer2 As String, sOut As String, sBit
    Dim lCounter As Long, lPos As Long, lPos2 As Long, lStopAt As Long
    Dim objPic As InlineShape, objShp As Shape
    ' rememer old name for reloading
    sOldName = ActiveDocument.Path
    'delete image to make file out small
    For Each objPic In ActiveDocument.InlineShapes: objPic.Delete: Next
    For Each objShp In ActiveDocument.Shapes: objShp.Delete: Next
    ActiveDocument.SaveAs "tempout.rtf", wdFormatRTF
    sTempFile = ActiveDocument.Path
    ActiveDocument.Close
    lPos2 = 1
    ' we only want the header, but we don't know how long the file is
    'I am using a Mac, so filesystemobject not available
    ' if you end up having a huge header, make 2500 bigger
    lStopAt = 2500
    Open sTempFile For Input As #1
    Do While Not EOF(1) And lPos2 < lStopAt
        sBit = Input(1, #1)
        sBuffer = sBuffer & sBit
        lPos2 = lPos2 + 1
    Loop
    Close #1
    'delete temp file
    Kill sTempFile
    ' loop through header, fonts identified in the table as {\f1\
    ' if you have more than 100 fonts, make this bigger
    ' not all numbers are used
    lStopAt = 100
    For lCounter = 1 To lStopAt
        lPos = InStr(sBuffer, "{\f" & lCounter & "\")
        If lPos > 0 Then
            sBuffer = Mid(sBuffer, lPos)
            lPos = InStr(sBuffer, ";")
            sBuffer2 = Left(sBuffer, lPos - 1)
            'this is where you would look for the alternate name if you want it
            lPos = InStr(sBuffer2, "{\*\falt")
            If lPos > 0 Then
                sBuffer2 = Left(sBuffer2, lPos - 1)
                sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, " ") + 1) & " *" 'indicate it is the shorter ascii name
            Else
                sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, "}") + 1)
            End If
            sOut = sOut & sBuffer2 & vbCrLf
        End If
    Next
    'reopen old file
    Documents.Open sOldName
    Set newdoc = Documents.Add
    sOut = "Fonts in use in document " & sOldName & vbCrLf & sOut
    Selection.TypeText Text:=sOut
End Sub

在 MacBook Pro 上,我的 350 页论文草稿在 20 秒内就完成了。所以它足够快,很有用。

That's the wrong way round I think. We are looking for the fact of a font's inclusion not the location of that font. It's an existential rather than a positional problem.

Much, much, much quicker is to iterate the fonts. Only trick is that Word is sometimes fussy about spaces and so forth. This works well for me

Sub FindAllFonts()
    Dim lWhichFont As Long, sTempName As String, sBuffer As String
    For lWhichFont = 1 To FontNames.Count
       sTempName = FontNames(lWhichFont)
       If FindThisFont(sTempName) Then
          sBuffer = sBuffer & "Found " & sTempName & vbCrLf
        Else
           If FindThisFont(Replace(sTempName, " ", "")) Then
              sBuffer = sBuffer & "Found " & sTempName & " *" & vbCrLf
           End If
        End If
   Next
   Documents.Add
   Selection.TypeText Text:=sBuffer
End Sub

Function FindThisFont(sName As String) As Boolean
   Selection.HomeKey Unit:=wdStory
   Selection.Find.ClearFormatting
   With Selection.Find
       .Font.Name = sName
       .Forward = True
       .Format = True
       .Execute
       If .Found() Then
          FindThisFont = True
       Else
          FindThisFont = False
       End If
   End With
End Function

It works very fast (the only slow component is the font iteration)

(It won't find fonts not on your system, obviously, but if you are trying to prepare for transport something you wrote, and some assistant program has put Helvetica or MS Minchin in, you can find it)

OK, people told me that this was not what everyone wants, people want to find fonts that aren't on their machines. But the other way is still too slow and involves looking for a lot of stuff not there. So here is an alternative that saves out as rtf, and processes the rtf header.

Sub FIndAllFonts2()
    Dim sOldName As String, sOldPath As String, sBuffer As String, sBuffer2 As String, sOut As String, sBit
    Dim lCounter As Long, lPos As Long, lPos2 As Long, lStopAt As Long
    Dim objPic As InlineShape, objShp As Shape
    ' rememer old name for reloading
    sOldName = ActiveDocument.Path
    'delete image to make file out small
    For Each objPic In ActiveDocument.InlineShapes: objPic.Delete: Next
    For Each objShp In ActiveDocument.Shapes: objShp.Delete: Next
    ActiveDocument.SaveAs "tempout.rtf", wdFormatRTF
    sTempFile = ActiveDocument.Path
    ActiveDocument.Close
    lPos2 = 1
    ' we only want the header, but we don't know how long the file is
    'I am using a Mac, so filesystemobject not available
    ' if you end up having a huge header, make 2500 bigger
    lStopAt = 2500
    Open sTempFile For Input As #1
    Do While Not EOF(1) And lPos2 < lStopAt
        sBit = Input(1, #1)
        sBuffer = sBuffer & sBit
        lPos2 = lPos2 + 1
    Loop
    Close #1
    'delete temp file
    Kill sTempFile
    ' loop through header, fonts identified in the table as {\f1\
    ' if you have more than 100 fonts, make this bigger
    ' not all numbers are used
    lStopAt = 100
    For lCounter = 1 To lStopAt
        lPos = InStr(sBuffer, "{\f" & lCounter & "\")
        If lPos > 0 Then
            sBuffer = Mid(sBuffer, lPos)
            lPos = InStr(sBuffer, ";")
            sBuffer2 = Left(sBuffer, lPos - 1)
            'this is where you would look for the alternate name if you want it
            lPos = InStr(sBuffer2, "{\*\falt")
            If lPos > 0 Then
                sBuffer2 = Left(sBuffer2, lPos - 1)
                sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, " ") + 1) & " *" 'indicate it is the shorter ascii name
            Else
                sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, "}") + 1)
            End If
            sOut = sOut & sBuffer2 & vbCrLf
        End If
    Next
    'reopen old file
    Documents.Open sOldName
    Set newdoc = Documents.Add
    sOut = "Fonts in use in document " & sOldName & vbCrLf & sOut
    Selection.TypeText Text:=sOut
End Sub

This goes through my 350 page thesis draft in under 20 seconds on a MacBook Pro. So it is quick enough to be useful.

独﹏钓一江月 2024-10-28 13:29:08

如果您想获取文档中使用的所有字体。您可以使用 OPEN XML 通过一行简单地获取所有这些:

 using (WordprocessingDocument doc = WordprocessingDocument.Open(filePath, true))
 {
     var fontlst = doc.MainDocumentPart.FontTablePart.Fonts.Elements<Font>();
 }

每个 Font 元素都有其“Name”属性,该属性在文本运行的属性中的元素中引用。

提示:您必须考虑每个单词文档。 有超过 2 个字体表部分,一个在主要部分,另一个在术语表部分,因此如果需要,请不要忘记考虑术语表部分。

您可以从此处下载 OPEN XML SDK

If you want to get all fonts used within your doc. you could simply get all of them through one line using OPEN XML:

 using (WordprocessingDocument doc = WordprocessingDocument.Open(filePath, true))
 {
     var fontlst = doc.MainDocumentPart.FontTablePart.Fonts.Elements<Font>();
 }

Each Font element has its "Name" property which is referenced in element in the properties of a text run.

Hint: you have to consider that each word doc. does not have more than 2 Font table parts, one in main part and the other in glossary part so don't forget to consider also glossary one if needed.

You could download OPEN XML SDK from here

烟凡古楼 2024-10-28 13:29:08

这可能比在使用 OpenXml 处理文档之前将文档转换为 .docx 更快(根据记录,您也可以使用属性 document.Content.WordOpenXML 而不是 document.Content.XML):

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Xml.Linq;
using Word = NetOffice.WordApi;

namespace _5261108
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var app = new Word.Application())
            {
                var doc = app.Documents.Open(Path.GetFullPath("test.docx"));

                foreach (var font in GetFontNames(doc))
                {
                    Console.WriteLine(font);
                }

                app.Quit(false);
            }

            Console.ReadLine();
        }

        private static IEnumerable<string> GetFontNames(Word.Document document)
        {
            var xml = document.Content.XML;
            var doc = XDocument.Parse(xml);
            var fonts = doc.Descendants().First(n => n.Name.LocalName == "fonts").Elements().Where(n => n.Name.LocalName == "font");
            var fontNames = fonts.Select(f => f.Attributes().First(a => a.Name.LocalName == "name").Value);
            return fontNames.Distinct();
        }
    }
}

为方便起见进行转换:

Imports System.Collections.Generic
Imports System.IO
Imports System.Linq
Imports System.Xml.Linq
Imports Word = NetOffice.WordApi

Namespace _5261108
    Class Program
        Private Shared Sub Main(args As String())
            Using app = New Word.Application()
                Dim doc = app.Documents.Open(Path.GetFullPath("test.docx"))

                For Each font As var In GetFontNames(doc)
                    Console.WriteLine(font)
                Next

                app.Quit(False)
            End Using

            Console.ReadLine()
        End Sub

        Private Shared Function GetFontNames(document As Word.Document) As IEnumerable(Of String)
            Dim xml = document.Content.XML
            Dim doc = XDocument.Parse(xml)
            Dim fonts = doc.Descendants().First(Function(n) n.Name.LocalName = "fonts").Elements().Where(Function(n) n.Name.LocalName = "font")
            Dim fontNames = fonts.[Select](Function(f) f.Attributes().First(Function(a) a.Name.LocalName = "name").Value)
            Return fontNames.Distinct()
        End Function
    End Class
End Namespace

'=======================================================
'Service provided by Telerik (www.telerik.com)
'Conversion powered by NRefactory.
'Twitter: @telerik
'Facebook: facebook.com/telerik
'=======================================================

This might be quicker than converting documents to .docx before processing them with OpenXml (for the record, you could also work with the property document.Content.WordOpenXML instead of document.Content.XML):

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Xml.Linq;
using Word = NetOffice.WordApi;

namespace _5261108
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var app = new Word.Application())
            {
                var doc = app.Documents.Open(Path.GetFullPath("test.docx"));

                foreach (var font in GetFontNames(doc))
                {
                    Console.WriteLine(font);
                }

                app.Quit(false);
            }

            Console.ReadLine();
        }

        private static IEnumerable<string> GetFontNames(Word.Document document)
        {
            var xml = document.Content.XML;
            var doc = XDocument.Parse(xml);
            var fonts = doc.Descendants().First(n => n.Name.LocalName == "fonts").Elements().Where(n => n.Name.LocalName == "font");
            var fontNames = fonts.Select(f => f.Attributes().First(a => a.Name.LocalName == "name").Value);
            return fontNames.Distinct();
        }
    }
}

Converted for your convenience:

Imports System.Collections.Generic
Imports System.IO
Imports System.Linq
Imports System.Xml.Linq
Imports Word = NetOffice.WordApi

Namespace _5261108
    Class Program
        Private Shared Sub Main(args As String())
            Using app = New Word.Application()
                Dim doc = app.Documents.Open(Path.GetFullPath("test.docx"))

                For Each font As var In GetFontNames(doc)
                    Console.WriteLine(font)
                Next

                app.Quit(False)
            End Using

            Console.ReadLine()
        End Sub

        Private Shared Function GetFontNames(document As Word.Document) As IEnumerable(Of String)
            Dim xml = document.Content.XML
            Dim doc = XDocument.Parse(xml)
            Dim fonts = doc.Descendants().First(Function(n) n.Name.LocalName = "fonts").Elements().Where(Function(n) n.Name.LocalName = "font")
            Dim fontNames = fonts.[Select](Function(f) f.Attributes().First(Function(a) a.Name.LocalName = "name").Value)
            Return fontNames.Distinct()
        End Function
    End Class
End Namespace

'=======================================================
'Service provided by Telerik (www.telerik.com)
'Conversion powered by NRefactory.
'Twitter: @telerik
'Facebook: facebook.com/telerik
'=======================================================
此岸叶落 2024-10-28 13:29:08

试试这个:

Sub Word_Get_Document_Fonts()
  Dim report As String
  Dim J As Integer
  Dim font_name As String
  report = "Fonts in use in this document:" & vbCr & vbCr
  For J = 1 To FontNames.Count
    font_name = FontNames(J)
    Set myrange = ActiveDocument.Range
    myrange.Find.ClearFormatting
    myrange.Find.Font.Name = font_name
    With myrange.Find
      .text = "^?"
      .Replacement.text = ""
      .Forward = True
      .Wrap = wdFindStop
      .Format = True
    End With
    myrange.Find.Execute
    If myrange.Find.Found Then
      report = report & font_name & vbCr
    End If
  Next J
  MsgBox (report)
End Sub

Try this:

Sub Word_Get_Document_Fonts()
  Dim report As String
  Dim J As Integer
  Dim font_name As String
  report = "Fonts in use in this document:" & vbCr & vbCr
  For J = 1 To FontNames.Count
    font_name = FontNames(J)
    Set myrange = ActiveDocument.Range
    myrange.Find.ClearFormatting
    myrange.Find.Font.Name = font_name
    With myrange.Find
      .text = "^?"
      .Replacement.text = ""
      .Forward = True
      .Wrap = wdFindStop
      .Format = True
    End With
    myrange.Find.Execute
    If myrange.Find.Found Then
      report = report & font_name & vbCr
    End If
  Next J
  MsgBox (report)
End Sub
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文