批量小写文本文件内容
经过半小时寻找答案后,我想不出一种方法来做到这一点(不涉及单独打开每个文本文件,选择所有文件,然后用 gedit 小写)。我希望能够运行一个脚本,无论是通过命令行还是最好包含在 nautilus-scripts 中,这样如果我在 GUI 上选择文件并右键单击脚本并小写,它就会完成。 我知道 tr 能够知道如何做到这一点,但我不知道如何将以下调用转为 tr '[:upper:]' '[:lower:]' tr '[:upper:]' '[:lower:]'
tr '[:upper:]' '[:lower:]'
输入.txt>通常情况下,我会将 input.txt 更改为 *.txt,将 output.txt 更改为 *.txt,但它不起作用。有什么想法吗?
额外:一旦解决了这个问题,如何使其适应 nautilus 脚本? :]
谢谢!
After half an hour searching for an answer to this, I can't think of a way to do it (without it involving opening each text file individually, selecting all and then lowercase-ing with gedit. I would like to be able to run a script, be it by commandline or preferably to include into nautilus-scripts, so that if I select the files on the GUI and rightclick to scripts and lowercase and it will be done.
I know that tr is able to know how to do it, but I can't figure out how can I turn the following call to tr '[:upper:]' '[:lower:]' < input.txt > output.txt
Normally, I would change input.txt to *.txt and *.txt for output.txt, but it doesn't work. Any ideas?
Extra: once that is solved, how to adapt it for nautilus-scripts? :]
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
还没有测试它,但我认为这可以通过目录递归搜索,查找所有文件,并将其内容替换为小写版本:
Haven't tested it, but I think this would work to search recursively through directories, looking in all the files, and replacing their contents for their lowercase version:
您可以编写一个简短的脚本来将“.txt”形式的文件转换为“-lowered.txt”:
如果要转换多个文件,则不能对所有文件使用output.txt当然是他们。并且您无法写入输入文件 - 这会截断它。
您可以写入中间文件,并将其重命名为最后的第二步。
要处理多个文件,请使用 find:
You can write a short script to transform files of the form ".txt" to "-lowered.txt":
If you want to transform multiple files, you can't use output.txt for all of them, of course. And you cant write to the input file - this will truncate it.
You can write to an intermediate file, and rename it as second step in the end.
To handle multiple files, use find:
编辑:
事实证明这是一个编码问题 - OP 的输入文件是 UTF16。
经过评论中的讨论后,OP 将使用
less
查看的数据复制/粘贴到粘贴箱中: http://pastebin.com/uHmYmhpT它看起来像这样:
...等等。
这显然不是一个 ascii(或 utf8)文本文件,因此大多数标准工具(
sed
、grep
、awk
等)都不会努力吧。开头的
是字节顺序标记< /a> 表示该文件是 UTF16 编码的文本。有一个标准工具可以在 UTF16 和 UTF8 之间进行转换,并且 UTF8 与字母数字字符的 ascii 兼容,因此如果我们将其转换为 UTF8,则使用sed
/grep
/awk
/etc 将能够编辑它。我们需要的工具是 iconv。不幸的是,
iconv
没有就地编辑功能,因此我们必须编写一个使用临时文件进行转换的循环:然后您可以运行
find
/sed
命令将它们小写。大多数程序不会关心您的文件现在是 UTF8 而不是 UTF16,但如果您遇到问题,那么您可以编写一个类似的循环,在小写后使用 iconv 将它们放回 UTF16 。如果您只想将所有与“*.txt”匹配的文件小写:
但请注意,如果有大量 .txt 文件,这将遇到命令行长度问题。
如果你想递归地对所有文件进行小写,我会使用迭戈的方法 - 但有几个错误需要修复:
应该可以解决问题。
如果您不希望它是递归的,您希望它只影响“
.txt
”文件,并且您的文件太多sed ... *.txt
工作,然后使用:(-maxdepth 1
停止递归)旧版本的 find 不支持
-exec ... +
语法,所以如果您遇到麻烦,请替换+
与\;
。+
更可取,因为它使find
调用sed
每次调用多个文件,而不是每个文件调用一次,因此效率稍高一些。Edit:
This turned out to be an encoding issue - the OP's input files are UTF16.
After a discussion in the comments, the OP copy/pasted the data from viewing with
less
into a pastebin: http://pastebin.com/uHmYmhpTIt looked like this:
... and so on.
This is clearly not an ascii (or utf8) text file, and so most standard tools (
sed
,grep
,awk
, etc) will not work on it.The
<FF><FE>
at the start is a Byte Order Mark that indicates that this file is UTF16-encoded text. There is a standard tool for converting between UTF16 and UTF8, and UTF8 is compatible with ascii for alphanumeric characters so if we convert it to UTF8, thensed
/grep
/awk
/etc will be able to edit it.The tool we need is
iconv
. Unfortunately,iconv
has no in-place editing feature so we'll have to write a loop that uses a temporary file to do the conversion:Then you can run the
find
/sed
command to lowercase them. Most programs won't care that your files are now UTF8 rather than UTF16, but if you have issues then you can write a similar loop that usesiconv
to put them back into UTF16 after you've lowercased them.If you just want to lowercase all files matching '*.txt':
But note that this will run into issues with the command line length if there's a lot of .txt files.
If you want to do lowercasing on all files recursively, I'd use Diego's approach - but there's a couple of errors to fix:
should do the trick.
If you don't want it to be recursive, you want it to only affect '
.txt
' files, and you've got too many files for thesed ... *.txt
to work, then use:(
-maxdepth 1
stops the recursion)Older versions of find won't support the
-exec ... +
syntax, so if you run into trouble with that then replace the+
with\;
. The+
is preferable because it makesfind
invokesed
with multiple files per invocation, rather than once per file, so it's slightly more efficient.