批量小写文本文件内容

发布于 2025-01-06 17:15:46 字数 437 浏览 1 评论 0原文

经过半小时寻找答案后,我想不出一种方法来做到这一点(不涉及单独打开每个文本文件,选择所有文件,然后用 gedit 小写)。我希望能够运行一个脚本,无论是通过命令行还是最好包含在 nautilus-scripts 中,这样如果我在 GUI 上选择文件并右键单击脚本并小写,它就会完成。 我知道 tr 能够知道如何做到这一点,但我不知道如何将以下调用转为 tr '[:upper:]' '[:lower:]' tr '[:upper:]' '[:lower:]' tr '[:upper:]' '[:lower:]' 输入.txt>通常情况下,我会将 input.txt 更改为 *.txt,将 output.txt 更改为 *.txt,但它不起作用。有什么想法吗?

额外:一旦解决了这个问题,如何使其适应 nautilus 脚本? :]

谢谢!

After half an hour searching for an answer to this, I can't think of a way to do it (without it involving opening each text file individually, selecting all and then lowercase-ing with gedit. I would like to be able to run a script, be it by commandline or preferably to include into nautilus-scripts, so that if I select the files on the GUI and rightclick to scripts and lowercase and it will be done.
I know that tr is able to know how to do it, but I can't figure out how can I turn the following call to tr '[:upper:]' '[:lower:]' < input.txt > output.txt Normally, I would change input.txt to *.txt and *.txt for output.txt, but it doesn't work. Any ideas?

Extra: once that is solved, how to adapt it for nautilus-scripts? :]

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

南街九尾狐 2025-01-13 17:15:47

还没有测试它,但我认为这可以通过目录递归搜索,查找所有文件,并将其内容替换为小写版本:

find ./ -type f -exec sed -i ‘s/.+/\0\L/’ {} \;

Haven't tested it, but I think this would work to search recursively through directories, looking in all the files, and replacing their contents for their lowercase version:

find ./ -type f -exec sed -i ‘s/.+/\0\L/’ {} \;
昔日梦未散 2025-01-13 17:15:47

您可以编写一个简短的脚本来将“.txt”形式的文件转换为“-lowered.txt”:

#!/bin/bash
# lowerit.sh
in=$1
out=${in/.txt/-lowered.txt}
tr '[:upper:]' '[:lower:]' < $in > $out

如果要转换多个文件,则不能对所有文件使用output.txt当然是他们。并且您无法写入输入文件 - 这会截断它。

您可以写入中间文件,并将其重命名为最后的第二步。

要处理多个文件,请使用 find:

find . -name "*.txt" -exec ./lowerit.sh {} +

You can write a short script to transform files of the form ".txt" to "-lowered.txt":

#!/bin/bash
# lowerit.sh
in=$1
out=${in/.txt/-lowered.txt}
tr '[:upper:]' '[:lower:]' < $in > $out

If you want to transform multiple files, you can't use output.txt for all of them, of course. And you cant write to the input file - this will truncate it.

You can write to an intermediate file, and rename it as second step in the end.

To handle multiple files, use find:

find . -name "*.txt" -exec ./lowerit.sh {} +
烟若柳尘 2025-01-13 17:15:46

编辑:
事实证明这是一个编码问题 - OP 的输入文件是 UTF16。

经过评论中的讨论后,OP 将使用 less 查看的数据复制/粘贴到粘贴箱中: http://pastebin.com/uHmYmhpT

它看起来像这样:

<FF><FE>1^@^M^@
^@0^@0^@:^@0^@0^@:^@0^@9^@,^@4^@4^@2^@ ^@-^@-^@>^@ ^@0^@0^@:^@0^@0^@:^@1^@1^@,^@4^@4^@4^@^M^@
^@j& ^@W^@O^@K^@E^@ ^@U^@P^@^M^@
^@T^@H^@I^@S^@ ^@M^@O^@R^@N^@I^@N^@G^@ ^@j&^M^@
^@^M^@
^@2^@^M^@

...等等。

这显然不是一个 ascii(或 utf8)文本文件,因此大多数标准工具(sedgrepawk 等)都不会努力吧。

开头的 字节顺序标记< /a> 表示该文件是 UTF16 编码的文本。有一个标准工具可以在 UTF16 和 UTF8 之间进行转换,并且 UTF8 与字母数字字符的 ascii 兼容,因此如果我们将其转换为 UTF8,则使用 sed/grep/awk/etc 将能够编辑它。

我们需要的工具是 iconv。不幸的是,iconv没有就地编辑功能,因此我们必须编写一个使用临时文件进行转换的循环:

find . -type f -name '*.srt' -print0 | while read  -d '' filename; do
    if file "$filename"|grep -q 'UTF-16 Unicode'; then
        iconv -f UTF16 -t UTF8 -o "$filename".utf8 "$filename" && mv "$filename".utf8 "$filename"
    fi
done

然后您可以运行find/ sed 命令将它们小写。大多数程序不会关心您的文件现在是 UTF8 而不是 UTF16,但如果您遇到问题,那么您可以编写一个类似的循环,在小写后使用 iconv 将它们放回 UTF16 。


如果您只想将所有与“*.txt”匹配的文件小写:

sed -i 's/.*/\L&/' *.txt

但请注意,如果有大量 .txt 文件,这将遇到命令行长度问题。

如果你想递归地对所有文件进行小写,我会使用迭戈的方法 - 但有几个错误需要修复:

find . -type f -exec sed -i 's/.*/\L&/' {} +

应该可以解决问题。

如果您希望它是递归的,您希望它只影响“.txt”文件,并且您的 文件太多sed ... *.txt 工作,然后使用:(

find . -maxdepth 1 -type f -name '*.txt' -exec sed -i 's/.*/\L&/' {} +

-maxdepth 1 停止递归)

旧版本的 find 不支持 -exec ... + 语法,所以如果您遇到麻烦,请替换+\;+ 更可取,因为它使 find 调用 sed 每次调用多个文件,而不是每个文件调用一次,因此效率稍高一些。

Edit:
This turned out to be an encoding issue - the OP's input files are UTF16.

After a discussion in the comments, the OP copy/pasted the data from viewing with less into a pastebin: http://pastebin.com/uHmYmhpT

It looked like this:

<FF><FE>1^@^M^@
^@0^@0^@:^@0^@0^@:^@0^@9^@,^@4^@4^@2^@ ^@-^@-^@>^@ ^@0^@0^@:^@0^@0^@:^@1^@1^@,^@4^@4^@4^@^M^@
^@j& ^@W^@O^@K^@E^@ ^@U^@P^@^M^@
^@T^@H^@I^@S^@ ^@M^@O^@R^@N^@I^@N^@G^@ ^@j&^M^@
^@^M^@
^@2^@^M^@

... and so on.

This is clearly not an ascii (or utf8) text file, and so most standard tools (sed, grep, awk, etc) will not work on it.

The <FF><FE> at the start is a Byte Order Mark that indicates that this file is UTF16-encoded text. There is a standard tool for converting between UTF16 and UTF8, and UTF8 is compatible with ascii for alphanumeric characters so if we convert it to UTF8, then sed/grep/awk/etc will be able to edit it.

The tool we need is iconv. Unfortunately, iconv has no in-place editing feature so we'll have to write a loop that uses a temporary file to do the conversion:

find . -type f -name '*.srt' -print0 | while read  -d '' filename; do
    if file "$filename"|grep -q 'UTF-16 Unicode'; then
        iconv -f UTF16 -t UTF8 -o "$filename".utf8 "$filename" && mv "$filename".utf8 "$filename"
    fi
done

Then you can run the find/sed command to lowercase them. Most programs won't care that your files are now UTF8 rather than UTF16, but if you have issues then you can write a similar loop that uses iconv to put them back into UTF16 after you've lowercased them.


If you just want to lowercase all files matching '*.txt':

sed -i 's/.*/\L&/' *.txt

But note that this will run into issues with the command line length if there's a lot of .txt files.

If you want to do lowercasing on all files recursively, I'd use Diego's approach - but there's a couple of errors to fix:

find . -type f -exec sed -i 's/.*/\L&/' {} +

should do the trick.

If you don't want it to be recursive, you want it to only affect '.txt' files, and you've got too many files for the sed ... *.txt to work, then use:

find . -maxdepth 1 -type f -name '*.txt' -exec sed -i 's/.*/\L&/' {} +

(-maxdepth 1 stops the recursion)

Older versions of find won't support the -exec ... + syntax, so if you run into trouble with that then replace the + with \;. The + is preferable because it makes find invoke sed with multiple files per invocation, rather than once per file, so it's slightly more efficient.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文