重命名 Word 文档并保存其文件名的前 10 个字母

发布于 2024-09-11 14:02:52 字数 307 浏览 1 评论 0原文

我使用 photorec 软件从损坏的硬盘中恢复了一些 Word 文档。问题是文件名无法恢复;它们都是由一系列数字重新命名的。有 2000 多个文档需要分类,我想知道是否可以使用一些自动化流程重命名它们。

是否有一个脚本可以用来查找文档中的前 10 个字母并用它重命名?它必须能够处理具有相同前 10 个字母的多个文档,因此不会覆盖具有相同名称的文档。另外,还必须避免使用非法字符(例如“?”、“*”、“/”等)重命名文档。

我对Python、C只有一点经验,对bash编程的经验更少在 Linux 中,如果我必须编写一个新脚本,如果我不知道自己在做什么,请耐心等待。

I have recovered some Word documents from a corrupted hard drive using a piece of software called photorec. The problem is that the documents' names can't be recovered; they are all renamed by a sequence of numbers. There are over 2000 documents to sort through and I was wondering if I could rename them using some automated process.

Is there a script I could use to find the first 10 letters in the document and rename it with that? It would have to be able to cope with multiple documents having the same first 10 letters and so not write over documents with the same name. Also, it would have to avoid renaming the document with illegal characters (such as '?', '*', '/', etc.)

I only have a little bit of experience with Python, C, and even less with bash programming in Linux, so bear with me if I don't know exactly what I'm doing if I have to write a new script.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

旧情勿念 2024-09-18 14:02:52

VBScript 怎么样?这是一个草图:

FolderName = "C:\Docs\"

Set fs = CreateObject("Scripting.FileSystemObject")

Set fldr = fs.GetFolder(Foldername)

Set ws = CreateObject("Word.Application")

For Each f In fldr.Files
    If Left(f.name,2)<>"~$" Then
        If InStr(f.Type, "Microsoft Word") Then

        MsgBox f.Name

        Set doc = ws.Documents.Open(Foldername & f.Name)
        s = vbNullString
        i = 1
        Do While Trim(s) = vbNullString And i <= doc.Paragraphs.Count
            s = doc.Paragraphs(i)
            s = CleanString(Left(s, 10))
            i = i + 1
        Loop

        doc.Close False

        If s = "" Then s = "NoParas"
        s1 = s
        i = 1
        Do While fs.FileExists(s1)
            s1 = s & i
            i = i + 1
        Loop

        MsgBox "Name " & Foldername & f.Name & " As " & Foldername & s1 _
            & Right(f.Name, InStrRev(f.Name, "."))
        '' This uses copy, because it seems safer

            f.Copy Foldername & s1 & Right(f.Name, InStrRev(f.Name, ".")), False

            '' MoveFile will copy the file:
        '' fs.MoveFile Foldername & f.Name, Foldername & s1 _
        ''  & Right(f.Name, InStrRev(f.Name, "."))

        End If
    End If
Next

msgbox "Done"
ws.Quit
Set ws = Nothing
Set fs = Nothing

Function CleanString(StringToClean)
''http://msdn.microsoft.com/en-us/library/ms974570.aspx
Dim objRegEx 
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True
objRegEx.Global = True

''Find anything not a-z, 0-9
objRegEx.Pattern = "[^a-z0-9]"

CleanString = objRegEx.Replace(StringToClean, "")
End Function

How about VBScript? Here is a sketch:

FolderName = "C:\Docs\"

Set fs = CreateObject("Scripting.FileSystemObject")

Set fldr = fs.GetFolder(Foldername)

Set ws = CreateObject("Word.Application")

For Each f In fldr.Files
    If Left(f.name,2)<>"~$" Then
        If InStr(f.Type, "Microsoft Word") Then

        MsgBox f.Name

        Set doc = ws.Documents.Open(Foldername & f.Name)
        s = vbNullString
        i = 1
        Do While Trim(s) = vbNullString And i <= doc.Paragraphs.Count
            s = doc.Paragraphs(i)
            s = CleanString(Left(s, 10))
            i = i + 1
        Loop

        doc.Close False

        If s = "" Then s = "NoParas"
        s1 = s
        i = 1
        Do While fs.FileExists(s1)
            s1 = s & i
            i = i + 1
        Loop

        MsgBox "Name " & Foldername & f.Name & " As " & Foldername & s1 _
            & Right(f.Name, InStrRev(f.Name, "."))
        '' This uses copy, because it seems safer

            f.Copy Foldername & s1 & Right(f.Name, InStrRev(f.Name, ".")), False

            '' MoveFile will copy the file:
        '' fs.MoveFile Foldername & f.Name, Foldername & s1 _
        ''  & Right(f.Name, InStrRev(f.Name, "."))

        End If
    End If
Next

msgbox "Done"
ws.Quit
Set ws = Nothing
Set fs = Nothing

Function CleanString(StringToClean)
''http://msdn.microsoft.com/en-us/library/ms974570.aspx
Dim objRegEx 
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True
objRegEx.Global = True

''Find anything not a-z, 0-9
objRegEx.Pattern = "[^a-z0-9]"

CleanString = objRegEx.Replace(StringToClean, "")
End Function
浸婚纱 2024-09-18 14:02:52

Word 文档以自定义格式存储,该格式会在文件的开头放置大量二进制文件。

最简单的事情就是在 Python 中敲出一些东西来搜索以 ASCII 字符开头的第一行。在这里:

#!/usr/bin/python

import glob
import os

for file in glob.glob("*.doc"):
    f = open(file, "rb")
    new_name = ""
    chars = 0

    char = f.read(1)
    while char != "":
        if 0 < ord(char) < 128:
            if ord("a") <= ord(char) <= ord("z") or ord("A") <= ord(char) <= ord("Z") or ord("0") <= ord(char) <= ord("9"):
                new_name += char
            else:
                new_name += "_"
            chars += 1
            if chars == 100:
                new_name = new_name[:20] + ".doc"
                print "renaming " + file + " to " + new_name
                f.close()
                break;
        else:
            new_name = ""
            chars = 0
        char = f.read(1)

    if new_name != "":
        os.rename(file, new_name)

注意:如果您想要 glob 多个目录,您需要相应地更改 glob 行。此外,这不考虑您尝试重命名的文件是否已经存在,因此如果您有多个具有相同前几个字符的文档,那么您需要处理该问题。

我连续找到了第一块 100 个 ASCII 字符(如果你寻找的字符少于这个字符,你最终会找到 doc 关键字等),然后使用其中的前 20 个来创建新名称,替换任何不是 az AZ 的内容或 0-9 带下划线以避免文件名问题。

Word documents are stored in a custom format which places a load of binary cruft on the beginning of the file.

The simplest thing would be to knock something up in Python that searched for the first line beginning with ASCII chars. Here you go:

#!/usr/bin/python

import glob
import os

for file in glob.glob("*.doc"):
    f = open(file, "rb")
    new_name = ""
    chars = 0

    char = f.read(1)
    while char != "":
        if 0 < ord(char) < 128:
            if ord("a") <= ord(char) <= ord("z") or ord("A") <= ord(char) <= ord("Z") or ord("0") <= ord(char) <= ord("9"):
                new_name += char
            else:
                new_name += "_"
            chars += 1
            if chars == 100:
                new_name = new_name[:20] + ".doc"
                print "renaming " + file + " to " + new_name
                f.close()
                break;
        else:
            new_name = ""
            chars = 0
        char = f.read(1)

    if new_name != "":
        os.rename(file, new_name)

NOTE: if you want to glob multiple directories you'll need to change the glob line accordingly. Also this takes no account of whether the file you're trying to rename to already exists, so if you have multiple docs with the same first few chars then you'll need to handle that.

I found the first chunk of 100 ASCII chars in a row (if you look for less than that you end up picking up doc keywords and such) and then used the first 20 of these to make the new name, replacing anything that's not a-z A-Z or 0-9 with underscores to avoid file name issues.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文