替换文件名 bash 中的部分字符串

发布于 2025-01-11 14:42:56 字数 457 浏览 0 评论 0原文

我有一个正在从文件夹 test 中读取的文本文件列表,如下所示:

file_list="$(ls ~/Desktop/test | 
while read path; do basename "$path"; done)"

这将生成这些文件的列表:

test_1.txt test_2.txt

我想更改名称中的特定字符串,特别是将 test 更改为 this,这样列表中就会包含如下文件:

this_1.txt this_2.txt

我想直接在 file_list 中执行此操作,但不想对计算机上文件夹中的实际文件执行此操作。

逐一循环是最有效的方法吗?

I have a list of text files I am reading in like this from a folder test like this:

file_list="$(ls ~/Desktop/test | 
while read path; do basename "$path"; done)"

This will produce a list of these files:

test_1.txt
test_2.txt

I want to change particular strings in the name, specifically test to this so the list would then have files like this:

this_1.txt
this_2.txt

I would like to do this directly in file_list I don't want to do it on the actual files in the folder on the computer.

Is looping through one by one the most efficient way to do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

失与倦" 2025-01-18 14:42:56

您不需要循环或外部命令(例如 basenamefindsed)。试试这个 Shellcheck - 干净的代码:

#! /bin/bash -p

shopt -s nullglob

files=( ~/Desktop/test/* )
bases=( "${files[@]##*/}" )
this_list="${bases[*]//test/this}"

declare -p this_list
  • shopt -s nullglob 使 glob 扩展为空当没有文件与模式匹配时。如果没有它,当没有任何匹配时,全局变量会扩展为(相当于)垃圾。
  • files=( ~/Desktop/test/* ) 使用 ~/Desktop/ 中所有文件(和目录)的路径填充名为 files 的数组test 目录( (~/Desktop/test/test_1.txt ...) )。请注意,名称以点 (.) 开头的文件将被排除。可以通过在程序早期运行 shopt -s dotglob 来包含它们。
  • bases=( "${files[@]##*/}" ) 使用 files 中文件的基本名称填充 bases 数组> 数组 ( ( test_1.txt ... ) )。有关 ## 的信息,请参阅 参数扩展 [Bash Hackers Wiki]正在做。
  • 如果您想按照其中一条评论中的建议删除 .txt 扩展名,您可以向该过程添加一个额外的阶段:stems=( "${bases[@]%. txt}")。在 Bash 中不可能同时执行多个字符串操作(例如 ##%)。
  • this_list="${bases[*]//test/this}" 使用 bases 中出现的所有条目填充 this_list 字符串每个中的 test 替换为 this ( "this_1.txt ..." )。再次,请参阅参数扩展 [Bash Hackers Wiki] 了解其工作原理的详细信息。列表中的条目由空格分隔。问题列表中的条目由换行符分隔。您可以通过在执行 this_list=... 赋值之前设置 IFS=$'\n' 来对 this_list 执行此操作。请参阅在构建和数组时修改 bash 中的 IFSIFS=$'\n'的确切含义是什么?这是“备份”$IFS变量的合理方法吗?。当使用 "${arrayname[*]}" 将数组转换为字符串时,IFS 值中的第一个字符用于分隔数组元素。
  • declare -p this_list 以明确的方式显示 this_list 的内容。

一些一般要点:

  • 切勿在程序中使用 ls 。它仅供交互使用。有时你可能会在程序中使用它,但它最终会给你带来沉重的打击。请参阅 为什么不应解析 ls(1) 的输出为什么解析'ls'(以及如何解析)?
  • 避免将文件列表放入字符串中。请改用数组。文件路径可以包含字符串可以容纳的任何字符(两者都不能包含 NUL 字符) 。因此,不存在可以安全地用于分隔字符串中的任意文件路径的安全字符或字符组合。该问题可以通过以各种方式引用文件路径来解决,但这会带来更多问题。
  • “最有效”的方法取决于需要处理的文件数量(除其他外)。这个答案中的代码在低端机器上的 Cygwin(通常比 Linux 慢得多)下针对包含 10,000 个文件的目录运行 0.2 秒。这对我来说就足够了。不过,Bash 通常很慢,并且当存在大量文件时,作为全局扩展的一部分完成的排序可能会非常慢。如果您有数十万个文件,纯 Bash 代码可能会变得无法使用。 findsed 的组合应该能够处理更多数量的文件,但 Bash 可能难以处理生成的巨大字符串(或数组)。

You don't need either loops or external commands (like basename, find, and sed). Try this Shellcheck-clean code:

#! /bin/bash -p

shopt -s nullglob

files=( ~/Desktop/test/* )
bases=( "${files[@]##*/}" )
this_list="${bases[*]//test/this}"

declare -p this_list
  • shopt -s nullglob makes globs expand to nothing when no files match a pattern. Without it globs expand to (what amounts to) garbage when nothing matches.
  • files=( ~/Desktop/test/* ) populates an array called files with the paths to all the files (and directories) in the ~/Desktop/test directory ( (~/Desktop/test/test_1.txt ...) ). Note that files whose names begin with a dot (.) are excluded. They can be included by running shopt -s dotglob earlier in the program.
  • bases=( "${files[@]##*/}" ) populates the bases array with the basenames of the files in the files array ( ( test_1.txt ... ) ). See Parameter expansion [Bash Hackers Wiki] for information about what the ## is doing.
  • If you wanted to remove the .txt extensions, as suggested in one of the comments, you could add an extra stage to the process: stems=( "${bases[@]%.txt}" ). It's not possible to do multiple string operations (e.g. ## and %) at once in Bash.
  • this_list="${bases[*]//test/this}" populates the this_list string with all the entries in bases with all occurrences of test in each of them replaced by this ( "this_1.txt ..." ). Again, see Parameter expansion [Bash Hackers Wiki] for details of how this works. The entries in the list are separated by spaces. The entries in the list in the question were separated by newlines. You can do that for this_list by setting IFS=$'\n' before doing the this_list=... assignment. See Modify IFS in bash while building and array, What is the exact meaning of IFS=$'\n'?, and Is it a sane approach to "back up" the $IFS variable?. The first character in the value of IFS is used to separate array elements when converting an array to a string with "${arrayname[*]}".
  • declare -p this_list shows the contents of this_list in an unambiguous way.

A few general points:

  • Never use ls in programs. It's for interactive use only. You might get away with using it in programs sometimes, but it will eventually bite you hard. See Why you shouldn't parse the output of ls(1) and Why not parse 'ls' (and what do to instead)?.
  • Avoid putting lists of files in strings. Use arrays instead. File paths can contain any character that a string can hold (neither can have have the NUL character). As a result, there is no safe character, or combination of characters, that can be safely used to separate arbitrary file paths in a string. The problem can be overcome by quoting the file paths in various ways, but that introduces more problems.
  • The "most efficient" way to do this depends on the number of files that need to be processed (among other things). The code in this answer runs in 0.2s against a directory containing 10 thousand files under Cygwin (which is generally much slower than Linux) on a low-end machine. That would be good enough for me. Bash is generally slow though, and the sorting done as part of glob expansion can be very slow when there are huge numbers of files. If you've got hundreds of thousands of files the pure Bash code might become unusable. A combination of find and sed should be able to handle much larger numbers of files, but Bash might struggle to handle the resulting huge strings (or arrays) anyway.
り繁华旳梦境 2025-01-18 14:42:56

逐一循环是[对文件名执行替换]最有效的方法吗?

不,它也不是提取基本名称的最有效方法。就此而言,解析 ls 的输出也不明智,尽管这是一个相对良性的情况。如果您想处理文件名列表,那么通过一个 sedawk 进程传递整个列表是一种更好的方法。例如:

file_list="$(
  find ~/Desktop/test -mindepth 1 -maxdepth 1 -not -name '.*' | 
    sed 's,^.*/,,; s,^test,this,'
)"

find 命令输出指定目录中非点文件的路径,每行一个,就像 ls 所做的那样。 sed 然后尝试对每个替换进行两次替换:第一个删除所有内容,直到并包括最后一个 / 字符 ala basename,第二个替换this 用于 test,其中后者出现在该行剩余内容的开头。

另请注意,这种方法与您原来的方法一样,会出现包含换行符的文件名问题。它不存在包含其他空格的文件名的固有问题,但如果任何文件名包含空格,您将无法正确解释结果。

Is looping through one by one the most efficient way to [perform substitutions on the filenames]?

No, nor is it the most efficient way to to extract the base names. Nor, for that matter, is it wise to parse the output of ls, though this is a relatively benign case. If you want to massage a list of filenames then passing the whole list through one sed or awk process is a better approach. For example:

file_list="$(
  find ~/Desktop/test -mindepth 1 -maxdepth 1 -not -name '.*' | 
    sed 's,^.*/,,; s,^test,this,'
)"

That find command outputs paths to the non-dotfiles in the specified directory, one per line, much as ls would do. sed then attempts two substitutions on each one: the first removes everything up to and including the last / character, ala basename, and the second substitutes this for test where the latter appears at the beginning of what's left of the line.

Note also that this approach, like your original one, will have issues with filenames containing newlines. It doesn't have an inherent issue with file names containing other whitespace, but you will have trouble interpreting the results correctly if any of the file names contain whitespace.

十二 2025-01-18 14:42:56

在这里解决:
https://unix.stackexchange.com/questions/36795/find-sed-search- and-replace

您可以使用 find 和 -exec 以及多个由 ; 分隔的 sed 命令在一行中执行此操作:

find . -exec sed -i '' 's/\([^/.]*\)\..*/\1/g;s?users/uname?gs://uname?g' {} +

第一个 sed 命令最多可达s/\([^\.]*\)\..*/\1/g 删除第一个 . 之后的所有内容。

第二个 sed 命令 s?users/ uname?gs://uname?g 进行替换

解析 ls 输出是不好的做法。

Solved here:
https://unix.stackexchange.com/questions/36795/find-sed-search-and-replace

You can do it one line, using find with -exec and multiple sed commands separated by ;:

find . -exec sed -i '' 's/\([^/.]*\)\..*/\1/g;s?users/uname?gs://uname?g' {} +

First sed command up to s/\([^\.]*\)\..*/\1/g removes everyting after first .

Second sed command s?users/uname?gs://uname?g does substitution

Parsing ls output is bad practice.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文