当前位置：文江博客话题详情

在 Unix 文件中用另一个列表替换字符串列表的有效方法是什么？

发布于 2024-12-02 00:00:40 字数 195 浏览 0 评论 0原文

假设我有两个字符串列表（列表 A 和列表 B），每个列表中的条目数 N 完全相同，并且我想将 A 中出现的所有第 n 个元素替换为 A 中 B 的第 n 个元素Unix 中的文件（最好使用 Bash 脚本）。

最有效的方法是什么？

一种低效的方法是对“sed s/stringA/stringB/g”进行 N 次调用。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

十年九夏 2024-12-09 00:00:41

这将一次性完成。它将 listA 和 listB 读取到 awk 数组中，然后对于 linput 的每一行，它检查每个单词，如果在 listA 中找到该单词，则将该单词替换为 listB 中的相应单词。

awk '
    FILENAME == ARGV[1] { listA[$1] = FNR; next }
    FILENAME == ARGV[2] { listB[FNR] = $1; next }
    {
        for (i = 1; i <= NF; i++) {
            if ($i in listA) {
                $i = listB[listA[$i]]
            }
        }
        print
    }
' listA listB filename > filename.new
mv filename.new filename

我假设 listA 中的字符串不包含空格（awk 的默认字段分隔符）

This will do it in one pass. It reads listA and listB into awk arrays, then for each line of the linput, it examines each word and if the word is found in listA, the word is replaced by the corresponding word in listB.

awk '
    FILENAME == ARGV[1] { listA[$1] = FNR; next }
    FILENAME == ARGV[2] { listB[FNR] = $1; next }
    {
        for (i = 1; i <= NF; i++) {
            if ($i in listA) {
                $i = listB[listA[$i]]
            }
        }
        print
    }
' listA listB filename > filename.new
mv filename.new filename

I'm assuming the strings in listA do not contain whitespace (awk's default field separator)

回复收藏 0 原文

舂唻埖巳落 2024-12-09 00:00:41

对编写 sed 脚本的 sed 进行一次调用，然后再调用一次来使用它？如果您的列表位于文件 listA 和 listB 中，那么：

paste -d : listA listB | sed 's/\([^:]*\):\([^:]*\)/s%\1%\2%/' > sed.script
sed -f sed.script files.to.be.mapped.*

我正在对不包含冒号或百分号的“单词”做出一些全面的假设，但您可以进行调整那。某些版本的 sed 对可指定的命令数量有上限；如果这是一个问题，因为你的单词列表足够大，那么你可能必须将生成的 sed 脚本分割成单独的文件来应用 - 或者更改为使用没有限制的东西（例如 Perl）。

另一个需要注意的事项是更改的顺序。如果你想交换两个单词，你需要仔细制作你的单词列表。一般来说，如果将 (1) wordA 映射到 wordB，(2) wordB 映射到 wordC，则 sed 脚本是否在映射 (2) 之前或之后进行映射 (1) 很重要。

显示的脚本没有注意单词边界；您可以通过多种方式对其进行仔细处理，具体取决于您使用的 sed 版本以及您对单词构成的标准。

Make one call to sed that writes the sed script, and another to use it? If your lists are in files listA and listB, then:

paste -d : listA listB | sed 's/\([^:]*\):\([^:]*\)/s%\1%\2%/' > sed.script
sed -f sed.script files.to.be.mapped.*

I'm making some sweeping assumptions about 'words' not containing either colon or percent symbols, but you can adapt around that. Some versions of sed have upper bounds on the number of commands that can be specified; if that's a problem because your word lists are big enough, then you may have to split the generated sed script into separate files which are applied - or change to use something without the limit (Perl, for example).

Another item to be aware of is sequence of changes. If you want to swap two words, you need to craft your word lists carefully. In general, if you map (1) wordA to wordB and (2) wordB to wordC, it matters whether the sed script does mapping (1) before or after mapping (2).

The script shown is not careful about word boundaries; you can make it careful about them in various ways, depending on the version of sed you are using and your criteria for what constitutes a word.

回复收藏 0 原文

白云不回头 2024-12-09 00:00:41

我需要做类似的事情，最后我根据映射文件生成 sed 命令：

$ cat file.map
abc => 123
def => 456
ghi => 789

$ cat stuff.txt
abc jdy kdt
kdb def gbk
qng pbf ghi
non non non
try one abc

$ sed `cat file.map | awk '{print "-e s/"$1"/"$3"/"}'`<<<"`cat stuff.txt`"
123 jdy kdt
kdb 456 gbk
qng pbf 789
non non non
try one 123

确保您的 shell 支持与映射中一样多的 sed 参数。

I needed to do something similar, and I wound up generating sed commands based on a map file:

$ cat file.map
abc => 123
def => 456
ghi => 789

$ cat stuff.txt
abc jdy kdt
kdb def gbk
qng pbf ghi
non non non
try one abc

$ sed `cat file.map | awk '{print "-e s/"$1"/"$3"/"}'`<<<"`cat stuff.txt`"
123 jdy kdt
kdb 456 gbk
qng pbf 789
non non non
try one 123

Make sure your shell supports as many parameters to sed as you have in your map.

回复收藏 0 原文

狼性发作 2024-12-09 00:00:41

这对于 Tcl 来说相当简单：

set fA [open listA r]
set fB [open listB r]
set fin [open input.file r]
set fout [open output.file w]

# read listA and listB and create the mapping of corresponding lines
while {[gets $fA strA] != -1} {
    set strB [gets $fB]
    lappend map $strA $strB
}

# apply the mapping to the input file
puts $fout [string map $map [read $fin]]

# if the file is large, do it line by line instead
#while {[gets $fin line] != -1} {
#    puts $fout [string map $map $line]
#}

close $fA
close $fB
close $fin
close $fout

file rename output.file input.file

This is fairly straightforward with Tcl:

set fA [open listA r]
set fB [open listB r]
set fin [open input.file r]
set fout [open output.file w]

# read listA and listB and create the mapping of corresponding lines
while {[gets $fA strA] != -1} {
    set strB [gets $fB]
    lappend map $strA $strB
}

# apply the mapping to the input file
puts $fout [string map $map [read $fin]]

# if the file is large, do it line by line instead
#while {[gets $fin line] != -1} {
#    puts $fout [string map $map $line]
#}

close $fA
close $fB
close $fin
close $fout

file rename output.file input.file

回复收藏 0 原文

沉溺在你眼里的海 2024-12-09 00:00:41

您可以在 bash 中执行此操作。将列表放入数组中。

listA=(a b c)
listB=(d e f)
data=$(<file)
echo "${data//${listA[2]}/${listB[2]}}" #change the 3rd element. Redirect to file where necessary

you can do this in bash. Get your lists into arrays.

listA=(a b c)
listB=(d e f)
data=$(<file)
echo "${data//${listA[2]}/${listB[2]}}" #change the 3rd element. Redirect to file where necessary

回复收藏 0 原文

~没有更多了~

关于作者

温柔嚣张

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

在 Unix 文件中用另一个列表替换字符串列表的有效方法是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

尘世孤行

烟─花易冷

你是年少的欢喜

倒带

忱杏

送君千里

友情链接

在 Unix 文件中用另一个列表替换字符串列表的有效方法是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

尘世孤行

烟─花易冷

你是年少的欢喜

倒带

忱杏

送君千里

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。