bash或Python有效的子字符串匹配和过滤

发布于 2025-01-25 10:58:20 字数 832 浏览 1 评论 0原文

我在目录中有一组文件名，其中一些可能具有相同的子字符串，但事先不知道。这是一个分类练习。我想在一个子目录中以最大的子字母匹配将文件匹配的最大订购字母匹配，并以该数量的字母和进度命名为最小匹配，直到没有2个或更多字母的匹配为止。忽略扩展。病例不敏感。忽略特殊字符。

例子。

AfricanElephant.jpg
elephant.jpg
grant.png
ant.png
el_gordo.tif
snowbell.png

从最大长度匹配到最小长度匹配将导致：

./8/AfricanElephant.jpg   and ./8/elephant.jpg
./3/grant.png  and ./3/ant.png
./2/snowbell.png  and ./2/el_gordo.tif

完全丢失在有效的狂欢或Python的方式上，以执行看起来很复杂的事情。

我找到了一些几乎存在的尴尬代码：

{
    count=0
    while ( match($0,/elephant/) ) {
        count++
        $0=substr($0,RSTART+1)
    }
    print count
}

temp.txt包含文件列表，并将其调用为EG awk -f test_match.awk temp.txt

缺点是a）这很难将“大象”作为字符串寻找（我不知道如何使其使用输入字符串（而不是文件）和输入测试字符串依靠，并 b）我真的只想打电话给bash函数以执行指定的类型，

如果我有这个，我可以将一些bash脚本包裹在此核心尴尬周围以使其正常工作。

原文

I have a set of filenames in a directory, some of which are likely to have identical substrings but not known in advance. This is a sorting exercise. I want to move the files with the maximum substring ordered letter match together in a subdirectory named with that number of letters and progress to the minimum match until no matches of 2 or more letters remain. Ignore extensions. Case insensitive. Ignore special characters.

Example.

AfricanElephant.jpg
elephant.jpg
grant.png
ant.png
el_gordo.tif
snowbell.png

Starting from maximum length matches to minimum length matches will result in:

./8/AfricanElephant.jpg   and ./8/elephant.jpg
./3/grant.png  and ./3/ant.png
./2/snowbell.png  and ./2/el_gordo.tif

Completely lost on an efficient bash or python way to do what seems a complex sort.

I found some awk code which is almost there:

{
    count=0
    while ( match($0,/elephant/) ) {
        count++
        $0=substr($0,RSTART+1)
    }
    print count
}

where temp.txt contains a list of the files and is invoked as eg
awk -f test_match.awk temp.txt

Drawback is that a) this is hardwired to look for "elephant" as a string (I don't know how to make it take an input string (rather than file) and an input test string to count against, and
b) I really just want to call a bash function to do the sort as specified

If I had this I could wrap some bash script around this core awk to make it work.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

余生再见 2025-02-01 10:58:20

function longest_common_substrings () {
shopt -s nocasematch
for file1 in * ; do for file in * ; do \
if [[ -f "$file1" ]]; then
   if [[ -f "$file" ]]; then
      base1=$(basename "$file" | cut -d. -f1)
      base2=$(basename "$file1" | cut -d. -f1)

      if [[ "$file" == "$file1" ]]; then
         echo -n ""
       else
       echo -n "$file  $file1 " ; $HOME/Scripts/longest_common_substring.sh "$base1"  "$base2" | tr -d '\n' | wc -c | awk '{$1=$1;print}' ;
       fi
fi
fi
done ;
done  | sort -r  -k3 | awk '{ print $1, $3 }' > /tmp/filesort_substring.txt


while IFS= read -r line; do \
           file_to_move=$(echo "$line" |  awk '{ print $1 }') ; 
           directory_to_move_to=$(echo "$line" |  awk '{ print $2 }') ;
           if [[ -f "$file_to_move" ]]; then
           mkdir -p "$directory_to_move_to"
           \gmv -b "$file_to_move" "$directory_to_move_to"
           fi
           done < /tmp/filesort_substring.txt
shopt -u nocasematch

其中$ home/scripts/lotest_common_substring.sh是


#!/bin/bash                                                                                                                                                     
shopt -s nocasematch
if ((${#1}>${#2})); then
   long=$1 short=$2
else
   long=$2 short=$1
fi

lshort=${#short}
score=0
for ((i=0;i<lshort-score;++i)); do
   for ((l=score+1;l<=lshort-i;++l)); do
      sub=${short:i:l}
      [[ $long != *$sub* ]] && break
      subfound=$sub score=$l
   done
done

if ((score)); then
   echo "$subfound"
fi

shopt -u nocasematch

我在本网站其他地方找到脚本中匹配的原始解决方案的荣誉

function longest_common_substrings () {
shopt -s nocasematch
for file1 in * ; do for file in * ; do \
if [[ -f "$file1" ]]; then
   if [[ -f "$file" ]]; then
      base1=$(basename "$file" | cut -d. -f1)
      base2=$(basename "$file1" | cut -d. -f1)

      if [[ "$file" == "$file1" ]]; then
         echo -n ""
       else
       echo -n "$file  $file1 " ; $HOME/Scripts/longest_common_substring.sh "$base1"  "$base2" | tr -d '\n' | wc -c | awk '{$1=$1;print}' ;
       fi
fi
fi
done ;
done  | sort -r  -k3 | awk '{ print $1, $3 }' > /tmp/filesort_substring.txt


while IFS= read -r line; do \
           file_to_move=$(echo "$line" |  awk '{ print $1 }') ; 
           directory_to_move_to=$(echo "$line" |  awk '{ print $2 }') ;
           if [[ -f "$file_to_move" ]]; then
           mkdir -p "$directory_to_move_to"
           \gmv -b "$file_to_move" "$directory_to_move_to"
           fi
           done < /tmp/filesort_substring.txt
shopt -u nocasematch

where $HOME/Scripts/longest_common_substring.sh is


#!/bin/bash                                                                                                                                                     
shopt -s nocasematch
if ((${#1}>${#2})); then
   long=$1 short=$2
else
   long=$2 short=$1
fi

lshort=${#short}
score=0
for ((i=0;i<lshort-score;++i)); do
   for ((l=score+1;l<=lshort-i;++l)); do
      sub=${short:i:l}
      [[ $long != *$sub* ]] && break
      subfound=$sub score=$l
   done
done

if ((score)); then
   echo "$subfound"
fi

shopt -u nocasematch

Kudos to the original solution for computing the match in the script which I found elsewhere in this site

回复收藏 0 原文

~没有更多了~