MS Word Doc:使用 Shell 脚本自动查找/替换
我有许多 Word 文档,想从中删除一些元素。我想做的如下:
- 复制并粘贴Word文件的全部内容(可能不是必需的)并将其移动到文本文件中或
- 使用正则表达式将.doc转换为.txt:替换
\[。 *\]
替换为 "" 并将\(.*\)
替换为 "" - 将结果保存到与原始 Word 文档同名的文本文件中。
想法和方向表示赞赏。就目前而言,我不知道如何以编程方式执行这些操作。我正在手动执行此操作。
如果重要的话,我正在使用 Ubuntu 11.04
I have a number of word documents that I'd like to remove some elements from. What I would like to do is as follows:
- Copy and paste the entire contents of the word file (may not be necessary) and move it into a text file OR Convert .doc to .txt
- Using regex: replace
\[.*\]
with "" AND replace\(.*\)
with "" - Save the result to a text file with the same name as the original word document.
Thoughts and direction appreciated. As it stands now, I don't know how to do any of these things programatically. I'm doing this manually as it stands.
If it matters, I'm using Ubuntu 11.04
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
由于您愿意使用纯文本,因此对您的算法进行了一些改进:
antiword
自动从 doc 到 tx 的转换sed
进行就地正则表达式修改:< code>sed -i -e's/bad/good/' file.txt更新(响应评论):
正则表达式很好,但我没有完全理解目标:
如果你愿意的话替换出现的 [foo] & (foo) 与“”一起使用:
sed -i -e's/\[.*\]/""/g' file.txt; sed -i -e's/\(.*\)/""/g' file.txt
如果你想替换出现的 [foo] & (foo) 与“foo”每次使用:
sed -i -e's/\[\(.*\)\]/"\1"/g' file.txt; sed -i -e's/(\(.*\))/"\1"/g' file.txt
Since you're open to using plain text, some improvements to your algo:
antiword
to automate conversion from doc to txsed
to do in-place regex modification:sed -i -e's/bad/good/' file.txt
Update (in response to comment):
The regexes are fine, but I didn't understand the objective completely:
if you want to replace occurrences of [foo] & (foo) with "" use:
sed -i -e's/\[.*\]/""/g' file.txt; sed -i -e's/\(.*\)/""/g' file.txt
if you want to replace occurrences [foo] & (foo) with "foo" each use:
sed -i -e's/\[\(.*\)\]/"\1"/g' file.txt; sed -i -e's/(\(.*\))/"\1"/g' file.txt