bash或Python有效的子字符串匹配和过滤
我在目录中有一组文件名,其中一些可能具有相同的子字符串,但事先不知道。这是一个分类练习。我想在一个子目录中以最大的子字母匹配将文件匹配的最大订购字母匹配,并以该数量的字母和进度命名为最小匹配,直到没有2个或更多字母的匹配为止。忽略扩展。病例不敏感。忽略特殊字符。
例子。
AfricanElephant.jpg
elephant.jpg
grant.png
ant.png
el_gordo.tif
snowbell.png
从最大长度匹配到最小长度匹配将导致:
./8/AfricanElephant.jpg and ./8/elephant.jpg
./3/grant.png and ./3/ant.png
./2/snowbell.png and ./2/el_gordo.tif
完全丢失在有效的狂欢或Python的方式上,以执行看起来很复杂的事情。
我找到了一些几乎存在的尴尬代码:
{
count=0
while ( match($0,/elephant/) ) {
count++
$0=substr($0,RSTART+1)
}
print count
}
temp.txt包含文件列表,并将其调用为EG awk -f test_match.awk temp.txt
缺点是a)这很难将“大象”作为字符串寻找(我不知道如何使其使用输入字符串(而不是文件)和输入测试字符串依靠,并 b)我真的只想打电话给bash函数以执行指定的类型,
如果我有这个,我可以将一些bash脚本包裹在此核心尴尬周围以使其正常工作。
I have a set of filenames in a directory, some of which are likely to have identical substrings but not known in advance. This is a sorting exercise. I want to move the files with the maximum substring ordered letter match together in a subdirectory named with that number of letters and progress to the minimum match until no matches of 2 or more letters remain. Ignore extensions. Case insensitive. Ignore special characters.
Example.
AfricanElephant.jpg
elephant.jpg
grant.png
ant.png
el_gordo.tif
snowbell.png
Starting from maximum length matches to minimum length matches will result in:
./8/AfricanElephant.jpg and ./8/elephant.jpg
./3/grant.png and ./3/ant.png
./2/snowbell.png and ./2/el_gordo.tif
Completely lost on an efficient bash or python way to do what seems a complex sort.
I found some awk code which is almost there:
{
count=0
while ( match($0,/elephant/) ) {
count++
$0=substr($0,RSTART+1)
}
print count
}
where temp.txt contains a list of the files and is invoked as eg
awk -f test_match.awk temp.txt
Drawback is that a) this is hardwired to look for "elephant" as a string (I don't know how to make it take an input string (rather than file) and an input test string to count against, and
b) I really just want to call a bash function to do the sort as specified
If I had this I could wrap some bash script around this core awk to make it work.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
其中$ home/scripts/lotest_common_substring.sh是
我在本网站其他地方找到脚本中匹配的原始解决方案的荣誉
where $HOME/Scripts/longest_common_substring.sh is
Kudos to the original solution for computing the match in the script which I found elsewhere in this site