批量小写文本文件内容

发布于 2025-01-06 17:15:46 字数 437 浏览 1 评论 0原文

经过半小时寻找答案后，我想不出一种方法来做到这一点（不涉及单独打开每个文本文件，选择所有文件，然后用 gedit 小写）。我希望能够运行一个脚本，无论是通过命令行还是最好包含在 nautilus-scripts 中，这样如果我在 GUI 上选择文件并右键单击脚本并小写，它就会完成。我知道 tr 能够知道如何做到这一点，但我不知道如何将以下调用转为 tr '[:upper:]' '[:lower:]' tr '[:upper:]' '[:lower:]' tr '[:upper:]' '[:lower:]' 输入.txt>通常情况下，我会将 input.txt 更改为 *.txt，将 output.txt 更改为 *.txt，但它不起作用。有什么想法吗？

额外：一旦解决了这个问题，如何使其适应 nautilus 脚本？：]

谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

南街九尾狐 2025-01-13 17:15:47

还没有测试它，但我认为这可以通过目录递归搜索，查找所有文件，并将其内容替换为小写版本：

find ./ -type f -exec sed -i ‘s/.+/\0\L/’ {} \;

Haven't tested it, but I think this would work to search recursively through directories, looking in all the files, and replacing their contents for their lowercase version:

find ./ -type f -exec sed -i ‘s/.+/\0\L/’ {} \;

回复收藏 0 原文

昔日梦未散 2025-01-13 17:15:47

您可以编写一个简短的脚本来将“.txt”形式的文件转换为“-lowered.txt”：

#!/bin/bash
# lowerit.sh
in=$1
out=${in/.txt/-lowered.txt}
tr '[:upper:]' '[:lower:]' < $in > $out

如果要转换多个文件，则不能对所有文件使用output.txt当然是他们。并且您无法写入输入文件 - 这会截断它。

您可以写入中间文件，并将其重命名为最后的第二步。

要处理多个文件，请使用 find：

find . -name "*.txt" -exec ./lowerit.sh {} +

You can write a short script to transform files of the form ".txt" to "-lowered.txt":

#!/bin/bash
# lowerit.sh
in=$1
out=${in/.txt/-lowered.txt}
tr '[:upper:]' '[:lower:]' < $in > $out

If you want to transform multiple files, you can't use output.txt for all of them, of course. And you cant write to the input file - this will truncate it.

You can write to an intermediate file, and rename it as second step in the end.

To handle multiple files, use find:

find . -name "*.txt" -exec ./lowerit.sh {} +

回复收藏 0 原文

烟若柳尘 2025-01-13 17:15:46

编辑：
事实证明这是一个编码问题 - OP 的输入文件是 UTF16。

经过评论中的讨论后，OP 将使用 less 查看的数据复制/粘贴到粘贴箱中： http://pastebin.com/uHmYmhpT

它看起来像这样：

<FF><FE>1^@^M^@
^@0^@0^@:^@0^@0^@:^@0^@9^@,^@4^@4^@2^@ ^@-^@-^@>^@ ^@0^@0^@:^@0^@0^@:^@1^@1^@,^@4^@4^@4^@^M^@
^@j& ^@W^@O^@K^@E^@ ^@U^@P^@^M^@
^@T^@H^@I^@S^@ ^@M^@O^@R^@N^@I^@N^@G^@ ^@j&^M^@
^@^M^@
^@2^@^M^@

...等等。

这显然不是一个 ascii（或 utf8）文本文件，因此大多数标准工具（sed、grep、awk 等）都不会努力吧。

开头的是字节顺序标记< /a> 表示该文件是 UTF16 编码的文本。有一个标准工具可以在 UTF16 和 UTF8 之间进行转换，并且 UTF8 与字母数字字符的 ascii 兼容，因此如果我们将其转换为 UTF8，则使用 sed/grep/awk/etc 将能够编辑它。

我们需要的工具是 iconv。不幸的是，iconv没有就地编辑功能，因此我们必须编写一个使用临时文件进行转换的循环：

find . -type f -name '*.srt' -print0 | while read  -d '' filename; do
    if file "$filename"|grep -q 'UTF-16 Unicode'; then
        iconv -f UTF16 -t UTF8 -o "$filename".utf8 "$filename" && mv "$filename".utf8 "$filename"
    fi
done

然后您可以运行find/ sed 命令将它们小写。大多数程序不会关心您的文件现在是 UTF8 而不是 UTF16，但如果您遇到问题，那么您可以编写一个类似的循环，在小写后使用 iconv 将它们放回 UTF16 。

如果您只想将所有与“*.txt”匹配的文件小写：

sed -i 's/.*/\L&/' *.txt

但请注意，如果有大量 .txt 文件，这将遇到命令行长度问题。

如果你想递归地对所有文件进行小写，我会使用迭戈的方法 - 但有几个错误需要修复：

find . -type f -exec sed -i 's/.*/\L&/' {} +

应该可以解决问题。

如果您不希望它是递归的，您希望它只影响“.txt”文件，并且您的 文件太多sed ... *.txt 工作，然后使用：（

find . -maxdepth 1 -type f -name '*.txt' -exec sed -i 's/.*/\L&/' {} +

-maxdepth 1 停止递归）

旧版本的 find 不支持 -exec ... + 语法，所以如果您遇到麻烦，请替换+ 与 \;。 + 更可取，因为它使 find 调用 sed 每次调用多个文件，而不是每个文件调用一次，因此效率稍高一些。

Edit:
This turned out to be an encoding issue - the OP's input files are UTF16.

After a discussion in the comments, the OP copy/pasted the data from viewing with less into a pastebin: http://pastebin.com/uHmYmhpT

It looked like this:

<FF><FE>1^@^M^@
^@0^@0^@:^@0^@0^@:^@0^@9^@,^@4^@4^@2^@ ^@-^@-^@>^@ ^@0^@0^@:^@0^@0^@:^@1^@1^@,^@4^@4^@4^@^M^@
^@j& ^@W^@O^@K^@E^@ ^@U^@P^@^M^@
^@T^@H^@I^@S^@ ^@M^@O^@R^@N^@I^@N^@G^@ ^@j&^M^@
^@^M^@
^@2^@^M^@

... and so on.

This is clearly not an ascii (or utf8) text file, and so most standard tools (sed, grep, awk, etc) will not work on it.

The <FF><FE> at the start is a Byte Order Mark that indicates that this file is UTF16-encoded text. There is a standard tool for converting between UTF16 and UTF8, and UTF8 is compatible with ascii for alphanumeric characters so if we convert it to UTF8, then sed/grep/awk/etc will be able to edit it.

The tool we need is iconv. Unfortunately, iconv has no in-place editing feature so we'll have to write a loop that uses a temporary file to do the conversion:

find . -type f -name '*.srt' -print0 | while read  -d '' filename; do
    if file "$filename"|grep -q 'UTF-16 Unicode'; then
        iconv -f UTF16 -t UTF8 -o "$filename".utf8 "$filename" && mv "$filename".utf8 "$filename"
    fi
done

Then you can run the find/sed command to lowercase them. Most programs won't care that your files are now UTF8 rather than UTF16, but if you have issues then you can write a similar loop that uses iconv to put them back into UTF16 after you've lowercased them.

If you just want to lowercase all files matching '*.txt':

sed -i 's/.*/\L&/' *.txt

But note that this will run into issues with the command line length if there's a lot of .txt files.

If you want to do lowercasing on all files recursively, I'd use Diego's approach - but there's a couple of errors to fix:

find . -type f -exec sed -i 's/.*/\L&/' {} +

should do the trick.

If you don't want it to be recursive, you want it to only affect '.txt' files, and you've got too many files for the sed ... *.txt to work, then use:

find . -maxdepth 1 -type f -name '*.txt' -exec sed -i 's/.*/\L&/' {} +

(-maxdepth 1 stops the recursion)

Older versions of find won't support the -exec ... + syntax, so if you run into trouble with that then replace the + with \;. The + is preferable because it makes find invoke sed with multiple files per invocation, rather than once per file, so it's slightly more efficient.

回复收藏 0 原文

~没有更多了~