有没有办法把word文档升级到2010

发布于 2024-12-05 10:20:27 字数 2340 浏览 1 评论 0原文

场景:我有大约 14000 个 Word 文档需要从“Microsoft Word 97 - 2003 文档”转换为“Microsoft Word 文档”。换句话说,升级到 2010 格式 (.docx)。

问题:有没有一种简单的方法可以使用 API 或其他方法来做到这一点?

注意:我只能找到一个将文档转换为 .docx 的 Microsoft 程序,但它们仍然以兼容模式打开。如果可以将它们转换为新格式,那就太好了。当您打开旧文档时获得相同的功能,它为您提供了转换它的选项。

编辑:刚刚找到http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word._document.convert.aspx 了解如何使用它。

EDIT2:这是我当前用于转换文档的功能,

Private Sub btnConvert_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnConvert.Click
    FolderBrowserDialog1.ShowDialog()
    Dim mainThread As Thread
    If Not String.IsNullOrEmpty(FolderBrowserDialog1.SelectedPath) Then
        lstFiles.Clear()

        DirSearch(FolderBrowserDialog1.SelectedPath)
        ThreadPool.SetMaxThreads(1, 1)
        lstFiles.RemoveAll(Function(y) y.Contains(".docx"))
        TextBox1.Text += "Conversion started at " & DateTime.Now().ToString & Environment.NewLine
        For Each x In lstFiles
            ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf ConvertDoc), x)
        Next

    End If
End Sub
Private Sub ConvertDoc(ByVal path As String)
    Dim word As New Microsoft.Office.Interop.Word.Application
    Dim doc As Microsoft.Office.Interop.Word.Document
    word.Visible = False

    Try
        Debug.Print(path)
        doc = word.Documents.Open(path, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing)
        doc.Convert()

    Catch ex As Exception
        ''do nothing
    Finally
        doc.Close()
        word.Quit()
    End Try

End Sub`

它允许我选择一个路径,然后查找子文件夹中的所有文档文件。该代码并不重要,所有要转换的文件都在 lstFiles 中。目前唯一的问题是,即使只有 10 个文档,转换也需要很长时间。我应该为每个文档使用一个单词应用程序而不是重复使用它吗?有什么建议吗?

此外,它会在大约 2 或 3 次转换后打开单词并开始闪烁,但会继续转换。

EDIT3:对上面的代码进行了一些调整,它运行得更干净。不过转换 8 个文件需要 1 分 10 秒。考虑到我有 14000 个,我需要转换此方法将花费相当长的时间。

EDIT4:再次更改代码。现在使用线程池。似乎跑得更快了一些。仍然需要在更好的计算机上运行来转换所有文档。或者按文件夹慢慢做。有人能想到其他方法来优化这个吗?

Scenario: I have about 14000 word documents that need to be converted from "Microsoft Word 97 - 2003 Document" to "Microsoft Word Document". In other words upgraded to 2010 format (.docx).

Question: Is there an easy way to do this using API's or something?

Note: I've only been able to find a microsoft program that converts the documents to .docx but they still open in compatability mode. It would be nice if they could just be converted to the new format. Same functionality you get when you open an old document and it gives you the option to convert it.

Edit: Just found http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word._document.convert.aspx looking into how to use it.

EDIT2: This is my current function for converting the documents

Private Sub btnConvert_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnConvert.Click
    FolderBrowserDialog1.ShowDialog()
    Dim mainThread As Thread
    If Not String.IsNullOrEmpty(FolderBrowserDialog1.SelectedPath) Then
        lstFiles.Clear()

        DirSearch(FolderBrowserDialog1.SelectedPath)
        ThreadPool.SetMaxThreads(1, 1)
        lstFiles.RemoveAll(Function(y) y.Contains(".docx"))
        TextBox1.Text += "Conversion started at " & DateTime.Now().ToString & Environment.NewLine
        For Each x In lstFiles
            ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf ConvertDoc), x)
        Next

    End If
End Sub
Private Sub ConvertDoc(ByVal path As String)
    Dim word As New Microsoft.Office.Interop.Word.Application
    Dim doc As Microsoft.Office.Interop.Word.Document
    word.Visible = False

    Try
        Debug.Print(path)
        doc = word.Documents.Open(path, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing)
        doc.Convert()

    Catch ex As Exception
        ''do nothing
    Finally
        doc.Close()
        word.Quit()
    End Try

End Sub`

It lets me select a path then find all doc files within the subfolders. That code isn't important, all the files for conversion are in lstFiles. Only problem at the moment is that it takes a really long time to convert even just 10 documents. Should I be using one word application per document instead of reusing it? Any suggestions?

Also it opens word after about 2 or 3 conversions and starts flashing but keeps converting.

EDIT3: Tweaked to code above a little bit and it runs cleaner. Takes 1min10sec to convert 8 files though. Considering I have 14000 I need to convert this method will take a reasonably long time.

EDIT4: Changed the code up again. Uses a threadpool now. Seems to run a bit faster. Still need to run on a better computer to convert all the documents. Or do them slowly by folder. Can anyone think of any other way to optimize this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

尾戒 2024-12-12 10:20:27

我在本地运行了您的代码,仅进行了一些小的编辑以改进跟踪和计时,并且“仅”花费了 13.73 秒来处理 12 个文件。这将在大约 4 小时内处理完您的 14,000 人。我在具有双核处理器的 Windows 7 x64 上运行 Visual Studio 2010。也许您可以使用更快的计算机?

这是我的完整代码,这只是一个带有单个按钮 Button1 和一个FolderBrowserDialogFolderBrowserDialog1 的表单:

Imports System.IO

Public Class Form1

Dim lstFiles As List(Of String) = New List(Of String)

Private Sub DirSearch(path As String)


    Dim thingies = From file In Directory.GetFiles(path) Where file.EndsWith(".doc") Select file

    lstFiles.AddRange(thingies)

    For Each subdir As String In Directory.GetDirectories(path)
        DirSearch(subdir)
    Next
End Sub

Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
    FolderBrowserDialog1.ShowDialog()

    If Not String.IsNullOrEmpty(FolderBrowserDialog1.SelectedPath) Then
        lstFiles.Clear()

        DirSearch(FolderBrowserDialog1.SelectedPath)
        Dim word As New Microsoft.Office.Interop.Word.Application
        Dim doc As Microsoft.Office.Interop.Word.Document
        lstFiles.RemoveAll(Function(y) y.Contains(".docx"))
        Dim startTime As DateTime = DateTime.Now
        Debug.Print("Timer started at " & DateTime.Now().ToString & Environment.NewLine)
        For Each x In lstFiles
            word.Visible = False
            Debug.Print(x + Environment.NewLine)
            doc = word.Documents.Open(x)
            doc.Convert()
            doc.Close()
        Next
        word.Quit()
        Dim endTime As DateTime = DateTime.Now
        Debug.Print("Took " & endTime.Subtract(startTime).TotalSeconds & " to process " & lstFiles.Count & " documents" & Environment.NewLine)
    End If

End Sub
End Class

I ran your code locally, with just some minor edits for improved tracing and timing, and it "only" took 13.73 seconds to do 12 files. That would take care of your 14,000 in about 4 hours. I'm running Visual Studio 2010 on Windows 7 x64 with a dual core processor. Perhaps you can just use a faster computer?

Here's my full code, this is just a form with a single button, Button1, and a FolderBrowserDialog, FolderBrowserDialog1:

Imports System.IO

Public Class Form1

Dim lstFiles As List(Of String) = New List(Of String)

Private Sub DirSearch(path As String)


    Dim thingies = From file In Directory.GetFiles(path) Where file.EndsWith(".doc") Select file

    lstFiles.AddRange(thingies)

    For Each subdir As String In Directory.GetDirectories(path)
        DirSearch(subdir)
    Next
End Sub

Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
    FolderBrowserDialog1.ShowDialog()

    If Not String.IsNullOrEmpty(FolderBrowserDialog1.SelectedPath) Then
        lstFiles.Clear()

        DirSearch(FolderBrowserDialog1.SelectedPath)
        Dim word As New Microsoft.Office.Interop.Word.Application
        Dim doc As Microsoft.Office.Interop.Word.Document
        lstFiles.RemoveAll(Function(y) y.Contains(".docx"))
        Dim startTime As DateTime = DateTime.Now
        Debug.Print("Timer started at " & DateTime.Now().ToString & Environment.NewLine)
        For Each x In lstFiles
            word.Visible = False
            Debug.Print(x + Environment.NewLine)
            doc = word.Documents.Open(x)
            doc.Convert()
            doc.Close()
        Next
        word.Quit()
        Dim endTime As DateTime = DateTime.Now
        Debug.Print("Took " & endTime.Subtract(startTime).TotalSeconds & " to process " & lstFiles.Count & " documents" & Environment.NewLine)
    End If

End Sub
End Class
嗳卜坏 2024-12-12 10:20:27

使用文字自动化并打开它并使用 wdFormatDocumentDefault 的 WdSaveFormat 枚举保存它,该枚举应该是 docx

http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word.wdsaveformat%28v=office.14%29.aspx

或尝试你提到的转换方法。不管怎样,100% 可能,而且应该相当容易。

编辑:如果丹尼尔发布的转换器有效,那就容易多了,他值得所有的荣誉:)

Use word automation and open it and save it with the WdSaveFormat enumeration for wdFormatDocumentDefault which should be docx

http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word.wdsaveformat%28v=office.14%29.aspx

or try your hand at the Convert method you mentioned. Either way 100% possible and should be fairly easy.

Edit: if the converter Daniel posted works, thats far easier and he deserves all the credit : )

独﹏钓一江月 2024-12-12 10:20:27

您可以使用免费的 Office 文件转换器。

这里解释了设置:

http://technet.microsoft.com/en-us/ Library/cc179019.aspx

有一个文件列表设置。

You can use the free Office File Converter.

Here explains the settings:

http://technet.microsoft.com/en-us/library/cc179019.aspx

There is a file list setting.

糖果控 2024-12-12 10:20:27

试试这个:

using Microsoft.Office.Interop
Microsoft.Office.Interop.Word.ApplicationClass word = new ApplicationClass();
object nullvalue = Type.Missing;
object filee = filename;
object file2 = String.Format("{0}{1}", filepath, "convertedfile.doc");
Microsoft.Office.Interop.Word.Document doc = word.Documents.Open(ref filee, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue);
        doc.SaveAs(ref file2, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue);

try this:

using Microsoft.Office.Interop
Microsoft.Office.Interop.Word.ApplicationClass word = new ApplicationClass();
object nullvalue = Type.Missing;
object filee = filename;
object file2 = String.Format("{0}{1}", filepath, "convertedfile.doc");
Microsoft.Office.Interop.Word.Document doc = word.Documents.Open(ref filee, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue);
        doc.SaveAs(ref file2, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue, ref nullvalue);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文