在Word Doc文件中递归查找带有通配符的字符串,输出到文本文件,如何清除Word垃圾?
这就是我想要做的:
- 使用 cmd.exe,转到已安装的 Windows 网络驱动器,
- 起始目录中有一个文件夹层次结构,其中包含 .doc 文件,我想在其中搜索以“开头的字符串” CCMPD”,然后有唯一的编号(缺陷编号)。
- 层次结构不一致,即有些文件夹在第一层有 .doc,而另一些文件夹下面还有一些 doc 所在的文件夹。
- 将每个匹配行输出到文件中。
我想出了命令行命令:
findstr /S "CCMPD" *.doc > D:\Data\FIND.txt
这确实有效(我对此感到非常自豪),但该文件充满了 Word 文档中的垃圾,我不知道如何将其过滤掉。我什至无法将输出粘贴到此处,因为它们不是可打印字符,但您以前可能都见过它们。
如何创建一个 find 命令,可以过滤掉 Word 垃圾并输出到易于阅读的文件?
Here's what I'm trying to do:
- using cmd.exe, go to a mounted windows network drive
- there is a hierarchy of folders in the starting directory with .doc files in them in which I want to search for a string that starts with "CCMPD" and then has unique numbers after that (defect numbers).
- The hierarchy is not consistent, that is, some folders have the .doc at the first level, others have some more folders under them where the doc is.
- Output each matching line to a file.
I came up with the command line command:
findstr /S "CCMPD" *.doc > D:\Data\FIND.txt
That actually works (I'm pretty proud of that) but the file is filled with the garbage that lives in a Word doc, and I can't figure out how to filter it out. I can't even paste the output in here because they're not printable characters but you have probably all seen them before.
How can I create a find command that can filter out the Word garbage and output to an easily readable file?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试使用字符串工具。这将提取字符串并清除垃圾。
try using the Strings tool. This will extract out the strings and get rid of the garbage.