在字符串上进行比较,而不是在行上进行比较
我觉得我应该能够在睡梦中做到这一点,但假设我有两个文本文件,每个文件都有一列 apache 模块的名称,没有特定的顺序。一个文件有 46 个(对于其自身而言)唯一的字符串。另一个有 67 行和 67 个 uniq(到文件)字符串。会有很多共同的字符串。
我需要找到 apache 模块的名称,这些模块不在较短的第一个文件中,而是在第二个较长的文件中。
我想通过搜索和比较字符串来做到这一点。行号、顺序或位置完全无关。我只想知道哪些模块仅在较长的文件中列出,需要安装。
默认情况下,uniq、comm 和 diff 希望按行和行号工作。 我不想并排比较;我只想要一个清单。
I feel I should be able to do this in my sleep, but let's say I have two text files each of which has a single column of the names of apache modules in no particular order. One file has 46 unique (to itself) strings. The other has 67 lines and 67 uniq (to the file) strings. There will be many strings in common.
I need to find the names of apache modules that are -not- in the shorter, first file but -are- in the second, longer file.
I want to do this by searching and comparing strings. Line number, order, or postition are completely irrellevant. I just want to know which modules listed only in the longer file need to be installed.
By default uniq, comm and diff want to work by lines, and line numbers.
I don't want a side-by-side comparison; I just want a list.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
将字符串分成几行,对它们进行排序和统一,然后使用
comm
进行分析。 (请参阅 BashFAQ #36)。举个例子,我假设您想要比较两个 Apache 配置文件之间的
LoadModule
指令。file1:
file2:
因此,要做到这一点:
...将抑制在两个文件或仅在较短文件中找到的任何行,并为您提供在第三个文件中找到的模块名称,产生以下输出:
对于使用考虑到更有趣的用例 - 不幸的是,虽然 GNU sort 的
-z
标志可以处理 NUL 分隔符(以允许对包含换行符的字符串进行比较),但comm
不能。但是,您可以在支持 NUL 分隔符的 shell 中编写自己的comm
实现,例如以下示例:Break your strings into lines, sort and uniqify them, and use
comm
for the analysis. (See BashFAQ #36).I'm going to assume, to have an example, that you want to compare the
LoadModule
directives between two Apache config files.file1:
file2:
So, to do this:
...will suppress any lines found in both or only in the shorter file, and give you the module names found in the third, yielding the following output:
For folks looking at this question with more interesting use cases in mind -- unfortunately, while GNU sort's
-z
flag can handle NUL delimiters (to allow comparison on strings containing newlines),comm
cannot. However, you can write your owncomm
implementation in shell which supports NUL delimiters, such as the following example:我会运行一个像这样的小 bash 脚本 (differ.bash):
像这样运行它:
基本上,我只是设置一个双 for 循环,较长的文件位于外循环,较短的文件位于内循环。这样,较长列表中的每个项目都会与较短列表中的项目进行比较。这使我们能够找到与较小列表中的某些内容不匹配的所有项目。
编辑:我尝试用这个更新的脚本来解决 Charles 的第一条评论:
I would run a little bash script like this (differ.bash):
Run it like so:
Basically, I am just setting up a double for loop with the longer file on the outer loop and the shorter file on the inner loop. That way each item in the longer list gets compared with the items in the shorter list. This allows us to find all the items that don't match something in the smaller list.
Edit: I have tried to address Charles' first comment with this updated script: