列出Word文档使用的字体(更快的方法)
我正在制定一个验证文件的流程,以确保它们符合公司标准。步骤之一是确保 Word 文档不使用未经批准的字体。
我有以下代码存根,它可以工作:
Dim wordApplication As Word.ApplicationClass = New Word.ApplicationClass()
Dim wordDocument As Word.Document = Nothing
Dim fontList As New List(Of String)()
Try
wordDocument = wordApplication.Documents.Open(FileName:="document Path")
'I've also tried using a for loop with an integer counter, no change in speed'
For Each c As Word.Range In wordDocument.Characters
If Not fontList.Contains(c.Font.Name) Then
fontList.Add(c.Font.Name)
End If
Next
但这令人难以置信慢!慢得令人难以置信= 2500 个字符/分钟(我用秒表计时)。我的大部分文件大约有 6,000 个单词/30,000 个字符(大约 25 页)。但有些文档长达 100 页......
有没有更快的方法来做到这一点?我必须支持 Office 2003 格式文件,因此不能选择 Open XML SDK。
--更新--
我尝试将其作为 Word 宏运行(使用在 http://word.tips.net/Pages/T001522_Creating_a_Document_Font_List.html)并且运行速度更快(不到一分钟)。不幸的是,就我的目的而言,我不相信宏会起作用。
--更新 #2--
我采纳了 Chris 的建议,并将文档即时转换为 Open XML 格式。然后,我使用以下代码查找所有 RunFonts 对象并读取字体名称:
Using docP As WordprocessingDocument = WordprocessingDocument.Open(tmpPath, False)
Dim runFonts = docP.MainDocumentPart.Document.Descendants(Of RunFonts)().Select(
Function(c) If(c.Ascii.HasValue, c.Ascii.InnerText, String.Empty)).Distinct().ToList()
fontList.AddRange(runFonts)
End Using
I am working on a process for validating documents to make sure that they meet corporate standards. One of the steps is to make sure that the Word document does not use non-approved fonts.
I have the following stub of code, which works:
Dim wordApplication As Word.ApplicationClass = New Word.ApplicationClass()
Dim wordDocument As Word.Document = Nothing
Dim fontList As New List(Of String)()
Try
wordDocument = wordApplication.Documents.Open(FileName:="document Path")
'I've also tried using a for loop with an integer counter, no change in speed'
For Each c As Word.Range In wordDocument.Characters
If Not fontList.Contains(c.Font.Name) Then
fontList.Add(c.Font.Name)
End If
Next
But this is incredibly slow! Incredibly slow = 2500 characters/minute (I timed it with StopWatch). Most of my files are around 6,000 words/30,000 characters (about 25 pages). But there are some documents that are in the 100's of pages...
Is there a faster way of doing this? I have to support Office 2003 format files, so the Open XML SDK isn't an option.
--UPDATE--
I tried running this as a Word macro (using the code found @ http://word.tips.net/Pages/T001522_Creating_a_Document_Font_List.html) and it runs much faster (under a minute). Unfortunately for my purposes I don't believe a Macro will work.
--UPDATE #2--
I took Chris's advice and converted the document to Open XML format on the fly. I then used the following code to find all RunFonts objects and read the font name:
Using docP As WordprocessingDocument = WordprocessingDocument.Open(tmpPath, False)
Dim runFonts = docP.MainDocumentPart.Document.Descendants(Of RunFonts)().Select(
Function(c) If(c.Ascii.HasValue, c.Ascii.InnerText, String.Empty)).Distinct().ToList()
fontList.AddRange(runFonts)
End Using
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
您可能必须支持 Office 2003,但这并不意味着您必须以该格式解析它。获取 Office 2003 文档,将其临时转换为 DOCX 文件,将其作为 ZIP 文件打开,解析
/word/fontTable.xml
文件,然后删除 DOCX。You might have to support Office 2003 but that doesn't mean you have to parse it in that format. Take the Office 2003 documents, temporarily convert them to DOCX files, open that as a ZIP file, parse the
/word/fontTable.xml
file and then delete the DOCX.我发现无需编码的另一种方法是:
甚至也许开发人员和程序员可以使用此过程对其进行编码并取出 PDF 字体列表,以对更多人有用。
Another way I found without coding is :
Even maybe Developers and Programmers could use this procedure to code it and take out PDF Font list to what could be useful for more people.
通过迭代段落可以大大加快速度。仅当段落包含混合字体时,您才需要逐字符检查。名称、粗体、斜体等属性具有特殊的“不确定”值,对于名称为空字符串,对于样式属性为 9999999。
因此,例如,如果 Bold = 9999999,则表示该段落包含一些粗体字符和一些非粗体字符。
我包括以下片段来展示总体思路:
You can speed things up a lot by iterating over paragraphs. Only if a paragraph contains mixed fonts would you need to check character by character. The Name, Bold, Italic, etc. properties have a special "indeterminate" value, which is an empty string for the Name or 9999999 for the style attributes.
So, for example, if Bold = 9999999 it means the paragraph contains some bold and some non-bold characters.
I include the following fragment to show the general idea:
我认为这是错误的做法。我们正在寻找字体包含的事实,而不是该字体的位置。这是一个存在问题,而不是一个位置问题。
迭代字体要快很多很多。唯一的窍门是Word有时对空格等很挑剔。这对我来说效果很好
它工作得非常快(唯一慢的组件是字体迭代)
(显然,它不会找到不在您的系统上的字体,但如果您正在尝试准备传输您编写的内容和一些辅助程序已经把 Helvetica 或 MS Minchin 放进去了,你可以找到它)
好吧,人们告诉我这不是每个人都想要的,人们想要找到他们机器上没有的字体。但另一种方法仍然太慢并且需要寻找很多不存在的东西。所以这里有一个替代方案,将其保存为 rtf,并处理 rtf 标头。
在 MacBook Pro 上,我的 350 页论文草稿在 20 秒内就完成了。所以它足够快,很有用。
That's the wrong way round I think. We are looking for the fact of a font's inclusion not the location of that font. It's an existential rather than a positional problem.
Much, much, much quicker is to iterate the fonts. Only trick is that Word is sometimes fussy about spaces and so forth. This works well for me
It works very fast (the only slow component is the font iteration)
(It won't find fonts not on your system, obviously, but if you are trying to prepare for transport something you wrote, and some assistant program has put Helvetica or MS Minchin in, you can find it)
OK, people told me that this was not what everyone wants, people want to find fonts that aren't on their machines. But the other way is still too slow and involves looking for a lot of stuff not there. So here is an alternative that saves out as rtf, and processes the rtf header.
This goes through my 350 page thesis draft in under 20 seconds on a MacBook Pro. So it is quick enough to be useful.
如果您想获取文档中使用的所有字体。您可以使用 OPEN XML 通过一行简单地获取所有这些:
每个 Font 元素都有其“Name”属性,该属性在文本运行的属性中的元素中引用。
提示:您必须考虑每个单词文档。 不有超过 2 个字体表部分,一个在主要部分,另一个在术语表部分,因此如果需要,请不要忘记考虑术语表部分。
您可以从此处下载 OPEN XML SDK
If you want to get all fonts used within your doc. you could simply get all of them through one line using OPEN XML:
Each Font element has its "Name" property which is referenced in element in the properties of a text run.
Hint: you have to consider that each word doc. does not have more than 2 Font table parts, one in main part and the other in glossary part so don't forget to consider also glossary one if needed.
You could download OPEN XML SDK from here
这可能比在使用 OpenXml 处理文档之前将文档转换为 .docx 更快(根据记录,您也可以使用属性 document.Content.WordOpenXML 而不是 document.Content.XML):
为方便起见进行转换:
This might be quicker than converting documents to .docx before processing them with OpenXml (for the record, you could also work with the property document.Content.WordOpenXML instead of document.Content.XML):
Converted for your convenience:
试试这个:
Try this: