自动化.doc到.htm word中的过程

发布于 2025-02-10 10:27:18 字数 1571 浏览 0 评论 0原文

问题

我们从另一家公司继承了一个较旧的项目,该项目的“帮助”索引由由.doc文件转换的HTM文件组成。问题是,他们的团队以非常过时的和不支持编码的所有这些文件导出了所有这些文件,因此他们包装了随机的特殊字符Alts。

最终,我们将更易于使用和开发一个系统替换该系统,但是鉴于该产品带有大型用户群,与此同时,我们需要解决此问题。是否有一些自动化工具为此(现在仍然可以使用,我尝试了几个较旧的VB脚本),还是我今天需要手动重新删除几百个文档? (这不一定是一个巨大的问题,但是我认为我的时间会更好地花在今天的工作上)

非常清楚:我有一个装满.doc的文件夹需要重新保存为带有UTF编码的.htm文件的文件

我已经尝试了

我一直在挖掘的文件解决方案。我当前的代码如下:

Sub ChangeDocsToTxtOrRTFOrHTML()
    Dim locFolder As String
    Dim fileType As String
    Dim oFolder As Object
    Dim tFolder As Object
    Dim fs As Object
    
    locFolder = "C:\Users\ColeD\Desktop\Help Files Angular"
    fileType = ".htm"
    Set fs = CreateObject("Scripting.FileSystemObject")
    Set oFolder = fs.GetFolder(locFolder)
    Set tFolder = fs.GetFolder(locFolder & "Converted")
    
    For Each oFile In oFolder.Files
    MsgBox ("hrtr!")
        Dim d As Document
        Set d = Application.Documents.Open(oFile.Path)
        strDocName = ActiveDocument.Name
        intPos = InStrRev(strDocName, ".")
        strDocName = Left(strDocName, intPos - 1)
        strDocName = strDocName & fileType
        ChangeFileOpenDirectory tFolder
        
        ActiveDocument.SaveAs2 FileName:=strDocName & fileType, _
                               FileFormat:=wdFormatHTML, _
                               Encoding:=msoEncodingUTF8

        d.Close
        ChangeFileOpenDirectory oFolder
    Next oFile
    MsgBox ("Done!")
End Sub

问题是,它仅打开一个文件然后停止

Question

We inherited an older project from another company, and this project has a "help" index made up of htm files that were converted from .doc files. The issue is, their team exported all of these files in a very outdated and not supported encoding so they are packed with random special character alts.

Eventually we will replace this system with a MUCH easier to use and develop one, but given that the product came with a large userbase, in the meantime we need to fix this. Is there some automation tool for this (that still works in present day, I've tried a couple older vb scripts), or am I going to need to manually re-export a few hundred docs today? (its not necessarily a huge issue, but there are other things that I think my time would be better spent on working on today)

To be very clear: I have a folder full of .doc files that need to be re-saved as .htm files with UTF-encoding

What I've tried

I've been digging through several SO posts trying various solutions. My current code is as follows:

Sub ChangeDocsToTxtOrRTFOrHTML()
    Dim locFolder As String
    Dim fileType As String
    Dim oFolder As Object
    Dim tFolder As Object
    Dim fs As Object
    
    locFolder = "C:\Users\ColeD\Desktop\Help Files Angular"
    fileType = ".htm"
    Set fs = CreateObject("Scripting.FileSystemObject")
    Set oFolder = fs.GetFolder(locFolder)
    Set tFolder = fs.GetFolder(locFolder & "Converted")
    
    For Each oFile In oFolder.Files
    MsgBox ("hrtr!")
        Dim d As Document
        Set d = Application.Documents.Open(oFile.Path)
        strDocName = ActiveDocument.Name
        intPos = InStrRev(strDocName, ".")
        strDocName = Left(strDocName, intPos - 1)
        strDocName = strDocName & fileType
        ChangeFileOpenDirectory tFolder
        
        ActiveDocument.SaveAs2 FileName:=strDocName & fileType, _
                               FileFormat:=wdFormatHTML, _
                               Encoding:=msoEncodingUTF8

        d.Close
        ChangeFileOpenDirectory oFolder
    Next oFile
    MsgBox ("Done!")
End Sub

The issue is, it only opens one file then stops

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

微暖i 2025-02-17 10:27:18

看起来您正在使用从将多个Word文档转换为使用VBA HTML文件,

但是您需要使用代码来使其在您的方案中起作用,该方案仅是HTML,而不是其他文件类型。请参见下文示例,以关注DOCX到HTML。

Sub test()

Dim fpath As String
Dim StrFile As String

On Error Resume Next
    Set wordapp = CreateObject("word.Application")
    wordapp.Visible = True
On Error GoTo 0

fpath = "C:\Users\user\"
StrFile = Dir(fpath & "*.doc*")
    
    Do While Len(StrFile) > 0
        wordapp.documents.Open fpath & StrFile
        Filename = CreateObject("Scripting.FileSystemObject").GetBaseName(StrFile)
        outputFileName = fpath & Filename & ".html"
        Debug.Print outputFileName
        Application.DisplayAlerts = False
        Debug.Print wordapp.ActiveDocument.Name
        wordapp.ActiveDocument.SaveAs Filename:=outputFileName, FileFormat:=8 'wdFormatFilteredHTML
        Application.DisplayAlerts = True
        wordapp.ActiveDocument.Close
        Debug.Print StrFile
        StrFile = Dir
    Loop

End Sub

It looks like you are using code copied from Convert multiple Word documents to HTML files using VBA

But you need to work with the code to make it work in your scenario which is only HTML, not the other file types. See below example for focusing on docx to HTML.

Sub test()

Dim fpath As String
Dim StrFile As String

On Error Resume Next
    Set wordapp = CreateObject("word.Application")
    wordapp.Visible = True
On Error GoTo 0

fpath = "C:\Users\user\"
StrFile = Dir(fpath & "*.doc*")
    
    Do While Len(StrFile) > 0
        wordapp.documents.Open fpath & StrFile
        Filename = CreateObject("Scripting.FileSystemObject").GetBaseName(StrFile)
        outputFileName = fpath & Filename & ".html"
        Debug.Print outputFileName
        Application.DisplayAlerts = False
        Debug.Print wordapp.ActiveDocument.Name
        wordapp.ActiveDocument.SaveAs Filename:=outputFileName, FileFormat:=8 'wdFormatFilteredHTML
        Application.DisplayAlerts = True
        wordapp.ActiveDocument.Close
        Debug.Print StrFile
        StrFile = Dir
    Loop

End Sub
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文