当前位置：文江博客话题详情

找出与其他单词有最多共同点的字母的单词

发布于 2024-11-19 01:55:59 字数 2662 浏览 4 评论 0原文

我希望 Perl (5.8.8) 找出哪个单词与数组中的其他单词具有最多的共同字母 - 但仅限于位于同一位置的字母。（最好不使用库。）

以这个单词列表为例：

BAKER
SALER
BALER
CARER
RUFFR

Her BALER 是与其他单词有最多共同字母的单词。它匹配 BAKER 中的 BAxER、SALER 中的 xALER、CARER 中的 xAxER 以及 RUFFR 中的 xxxxR。

我希望 Perl 在具有相同长度和大小写的任意单词列表中为我找到这个单词。看来我在这里碰壁了，所以非常感谢帮助！

到目前为止我已经尝试过的

目前还没有太多的脚本：

use strict;
use warnings; 
my @wordlist = qw(BAKER SALER MALER BARER RUFFR);
foreach my $word (@wordlist) {
    my @letters = split(//, $word);
    # now trip trough each iteration and work magic...
}

在注释所在的地方，我尝试了几种代码，其中大量使用 for 循环和 ++ 变量。到目前为止，我的所有尝试都没有达到我需要的效果。

因此，为了更好地解释：我需要的是针对列表中的每个字母位置逐字测试，以找到在该字母位置处与列表中其他字母最多的单词。< /strong>

一种可能的方法是首先检查哪个单词在字母位置 0 处最常见，然后测试字母位置 1，依此类推，直到找到总和上字母最多的单词与列表中的其他单词相同。然后我想像矩阵一样打印列表，其中包含每个字母位置的分数加上每个单词的总分，这与 DavidO 建议的不同。

实际上，您最终得到的是每个单词的矩阵，其中包含每个字母位置的分数，以及矩阵中每个单词的总分数。

程序的目的

呵呵，我不妨这样说：该程序是为了破解游戏《辐射 3》中的终端而设计的。:D 我的想法是，这是学习 Perl 的好方法，同时还能享受有趣的游戏。

这是我用于研究的《辐射 3》终端黑客教程之一： FALLOUT 3: Hacking FAQ v1.2，我已经制作了一个程序来缩短单词列表，如下所示：

#!/usr/bin/perl
# See if one word has equal letters as the other, and how many of them are equal
use strict;
use warnings; 

my $checkword = "APPRECIATION"; # the word to be checked
my $match = 4; # equal to the match you got from testing your checkword
my @checkletters = split(//, $checkword); #/

my @wordlist = qw(
    PARTNERSHIPS
    REPRIMANDING
    CIVILIZATION
    APPRECIATION
    CONVERSATION
    CIRCUMSTANCE
    PURIFICATION
    SECLUSIONIST
    CONSTRUCTION
    DISAPPEARING
    TRANSMISSION
    APPREHENSIVE
    ENCOUNTERING
);

print "$checkword has $match letters in common with:\n";

foreach my $word (@wordlist) {
    next if $word eq $checkword;
    my @letters = split(//, $word);
    my $length = @letters; # determine length of array (how many letters to check)

    my $eq_letters = 0; # reset to 0 for every new word to be tested
    for (my $i = 0; $i < $length; $i++) {
        if ($letters[$i] eq $checkletters[$i]) {
            $eq_letters++;
        }
    }
    if ($eq_letters == $match) {
        print "$word\n";
    }
}
# Now to make a script on to find the best word to check in the first place...

此脚本将产生CONSTRUCTION 和 TRANSMISSION 作为其结果，就像游戏常见问题解答中一样。不过，最初问题的诀窍（也是我自己没能找到的）是如何首先找到最好的词来尝试，即APPRECIATION。

好的，我现在根据您的帮助提供了我自己的解决方案，并认为该线程已关闭。非常非常感谢所有的贡献者。你帮了我很大的忙，在这个过程中我也学到了很多东西。:D

原文

I want Perl (5.8.8) to find out what word has the most letters in common with the other words in an array - but only letters that are in the same place. (And preferably without using libs.)

Take this list of words as an example:

BAKER
SALER
BALER
CARER
RUFFR

Her BALER is the word that has the most letters in common with the others. It matches BAxER in BAKER, xALER in SALER, xAxER in CARER, and xxxxR in RUFFR.

I want Perl to find this word for me in an arbitrary list of words with the same length and case. Seems I've hit the wall here, so help is much appreciated!

What I've tried until now

Don't really have much of a script at the moment:

use strict;
use warnings; 
my @wordlist = qw(BAKER SALER MALER BARER RUFFR);
foreach my $word (@wordlist) {
    my @letters = split(//, $word);
    # now trip trough each iteration and work magic...
}

Where the comment is, I've tried several kinds of code, heavy with for-loops and ++ varables. Thus far, none of my attempts have done what I need it to do.

So, to better explain: What I need is to test word for word against the list, for each letterposition, to find the word that has the most letters in common with the others in the list, at that letter's position.

One possible way could be to first check which word(s) has the most in common at letter-position 0, then test letter-position 1, and so on, until you find the word that in sum has the most letters in common with the other words in the list. Then I'd like to print the list like a matrix with scores for each letterposition plus a total score for each word, not unlike what DavidO suggest.

What you'd in effect end up with is a matrix for each words, with the score for each letter position, and the sum total score fore each word in the matrix.

Purpose of the Program

Hehe, I might as well say it: The program is for hacking terminals in the game Fallout 3. :D My thinking is that it's a great way to learn Perl while also having fun gaming.

Here's one of the Fallout 3 terminal hacking tutorials I've used for research: FALLOUT 3: Hacking FAQ v1.2, and I've already made a program to shorten the list of words, like this:

#!/usr/bin/perl
# See if one word has equal letters as the other, and how many of them are equal
use strict;
use warnings; 

my $checkword = "APPRECIATION"; # the word to be checked
my $match = 4; # equal to the match you got from testing your checkword
my @checkletters = split(//, $checkword); #/

my @wordlist = qw(
    PARTNERSHIPS
    REPRIMANDING
    CIVILIZATION
    APPRECIATION
    CONVERSATION
    CIRCUMSTANCE
    PURIFICATION
    SECLUSIONIST
    CONSTRUCTION
    DISAPPEARING
    TRANSMISSION
    APPREHENSIVE
    ENCOUNTERING
);

print "$checkword has $match letters in common with:\n";

foreach my $word (@wordlist) {
    next if $word eq $checkword;
    my @letters = split(//, $word);
    my $length = @letters; # determine length of array (how many letters to check)

    my $eq_letters = 0; # reset to 0 for every new word to be tested
    for (my $i = 0; $i < $length; $i++) {
        if ($letters[$i] eq $checkletters[$i]) {
            $eq_letters++;
        }
    }
    if ($eq_letters == $match) {
        print "$word\n";
    }
}
# Now to make a script on to find the best word to check in the first place...

This script will yield CONSTRUCTION and TRANSMISSION as its result, just as in the game FAQ. The trick to the original question, though (and the thing I didn't manage to find out on my own), is how to find the best word to try in the first place, i.e. APPRECIATION.

OK, I've now supplied my own solution based on your help, and consider this thread closed. Many, many thanks to all the contributers. You've helped tremendously, and on the way I've also learned a lot. :D

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

百善笑为先 2024-11-26 01:55:59

这是一种方法。重新阅读您的规范几次后，我认为这就是您正在寻找的。

值得一提的是，可能会有多个单词具有相同的最高分。从您的列表中只有一个获胜者，但在较长的列表中，可能会有几个同样获胜的单词。这个解决方案就解决了这个问题。另外，据我了解，只有当字母匹配出现在每个单词的同一列中时，才计算字母匹配。如果是这种情况，这里有一个可行的解决方案：

use 5.012;
use strict;
use warnings;
use List::Util 'max';

my @words = qw/
    BAKER
    SALER
    BALER
    CARER
    RUFFR
/;

my @scores;
foreach my $word ( @words ) {
    my $score;
    foreach my $comp_word ( @words ) {
        next if $comp_word eq $word;
        foreach my $pos ( 0 .. ( length $word ) - 1 ) {
            $score++ if substr( $word, $pos, 1 ) eq substr( $comp_word, $pos, 1);
        }
    }
    push @scores, $score;
}
my $max = max( @scores );
my ( @max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores;

say "Words with most matches:";
say for @words[@max_ixs];

该解决方案计算每个字母列中每个单词的字母与其他单词匹配的次数。例如：

Words:     Scores:       Because:
ABC        1, 2, 1 = 4   A matched once,  B matched twice, C matched once.
ABD        1, 2, 1 = 4   A matched once,  B matched twice, D matched once.
CBD        0, 2, 1 = 3   C never matched, B matched twice, D matched once.
BAC        0, 0, 1 = 1   B never matched, A never matched, C matched once.

这将为您提供 ABC 和 ABD 的获胜者，每个获胜者都有四个位置匹配的分数。即，第一列、第一行与第一列、第二行、第三行和第四行相匹配的累积次数，对于后续列，依此类推。
它可能可以进一步优化，并重新措辞得更短，但我试图保持逻辑相当容易阅读。享受！

更新/编辑
我想了想，意识到虽然我现有的方法完全按照您原来的问题要求，但它在 O(n^2) 时间内完成，这相对较慢。但是，如果我们对每一列的字母使用哈希键（每个键一个字母），并计算每个字母在该列中出现的次数（作为哈希元素的值），我们可以在 O(1 ）时间，以及我们在 O(n*c) 时间内遍历列表的时间（其中 c 是列数，n 是单词数）。还有一些设置时间（创建哈希）。但我们仍然有很大的进步。这是每种技术的新版本，以及每种技术的基准比较。

use strict;
use warnings;
use List::Util qw/ max sum /;
use Benchmark qw/ cmpthese /;

my @words = qw/
    PARTNERSHIPS
    REPRIMANDING
    CIVILIZATION
    APPRECIATION
    CONVERSATION
    CIRCUMSTANCE
    PURIFICATION
    SECLUSIONIST
    CONSTRUCTION
    DISAPPEARING
    TRANSMISSION
    APPREHENSIVE
    ENCOUNTERING
/;


# Just a test run for each solution.
my( $top, $indexes_ref );

($top, $indexes_ref ) = find_top_matches_force( \@words );
print "Testing force method: $top matches.\n";
print "@words[@$indexes_ref]\n";

( $top, $indexes_ref ) = find_top_matches_hash( \@words );
print "Testing hash  method: $top matches.\n";
print "@words[@$indexes_ref]\n";



my $count = 20000;
cmpthese( $count, {
    'Hash'  => sub{ find_top_matches_hash( \@words ); },
    'Force' => sub{ find_top_matches_force( \@words ); },
} );


sub find_top_matches_hash {
    my $words = shift;
    my @scores;
    my $columns;
    my $max_col = max( map { length $_ } @{$words} ) - 1;
    foreach my $col_idx ( 0 .. $max_col ) {
        $columns->[$col_idx]{ substr $_, $col_idx, 1 }++ 
            for @{$words};
    }
    foreach my $word ( @{$words} ) {
        my $score = sum( 
            map{ 
                $columns->[$_]{ substr $word, $_, 1 } - 1
            } 0 .. $max_col
        );
        push @scores, $score;
    }
    my $max = max( @scores );
    my ( @max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores;
    return(  $max, \@max_ixs );
}


sub find_top_matches_force {
    my $words = shift;
    my @scores;
    foreach my $word ( @{$words} ) {
        my $score;
        foreach my $comp_word ( @{$words} ) {
            next if $comp_word eq $word;
            foreach my $pos ( 0 .. ( length $word ) - 1 ) {
                $score++ if 
                    substr( $word, $pos, 1 ) eq substr( $comp_word, $pos, 1);
            }
        }
        push @scores, $score;
    }
    my $max = max( @scores );
    my ( @max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores;
    return( $max, \@max_ixs );
}

输出是：

Testing force method: 39 matches.
APPRECIATION
Testing hash  method: 39 matches.
APPRECIATION
        Rate Force  Hash
Force 2358/s    --  -74%
Hash  9132/s  287%    --

我意识到在您看到提供的其他一些选项后，您的原始规范发生了变化，这在某种程度上是创新的本质，但这个谜题仍然存在于我的脑海中。正如你所看到的，我的哈希方法比原始方法快了 287%。用更少的时间获得更多的乐趣！

Here's one way. Having re-read your spec a couple of times I think it's what you're looking for.

It's worth mentioning that it's possible there will be more than one word with an equal top score. From your list there's only one winner, but it's possible that in longer lists, there will be several equally winning words. This solution deals with that. Also, as I understand it, you count letter matches only if they occur in the same column per word. If that's the case, here's a working solution:

use 5.012;
use strict;
use warnings;
use List::Util 'max';

my @words = qw/
    BAKER
    SALER
    BALER
    CARER
    RUFFR
/;

my @scores;
foreach my $word ( @words ) {
    my $score;
    foreach my $comp_word ( @words ) {
        next if $comp_word eq $word;
        foreach my $pos ( 0 .. ( length $word ) - 1 ) {
            $score++ if substr( $word, $pos, 1 ) eq substr( $comp_word, $pos, 1);
        }
    }
    push @scores, $score;
}
my $max = max( @scores );
my ( @max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores;

say "Words with most matches:";
say for @words[@max_ixs];

This solution counts how many times per letter column each word's letters match other words. So for example:

Words:     Scores:       Because:
ABC        1, 2, 1 = 4   A matched once,  B matched twice, C matched once.
ABD        1, 2, 1 = 4   A matched once,  B matched twice, D matched once.
CBD        0, 2, 1 = 3   C never matched, B matched twice, D matched once.
BAC        0, 0, 1 = 1   B never matched, A never matched, C matched once.

That gives you the winners of ABC and ABD, each with a score of four positional matches. Ie, the cumulative times that column one, row one matched column one row two, three, and four, and so on for the subsequent columns.
It may be able to be optimized further, and re-worded to be shorter, but I tried to keep the logic fairly easy to read. Enjoy!

UPDATE / EDIT
I thought about it and realized that though my existing method does exactly what your original question requested, it did it in O(n^2) time, which is comparatively slow. But if we use hash keys for each column's letters (one letter per key), and do a count of how many times each letter appears in the column (as the value of the hash element), we could do our summations in O(1) time, and our traversal of the list in O(n*c) time (where c is the number of columns, and n is the number of words). There's some setup time too (creation of the hash). But we still have a big improvement. Here is a new version of each technique, as well as a benchmark comparison of each.

use strict;
use warnings;
use List::Util qw/ max sum /;
use Benchmark qw/ cmpthese /;

my @words = qw/
    PARTNERSHIPS
    REPRIMANDING
    CIVILIZATION
    APPRECIATION
    CONVERSATION
    CIRCUMSTANCE
    PURIFICATION
    SECLUSIONIST
    CONSTRUCTION
    DISAPPEARING
    TRANSMISSION
    APPREHENSIVE
    ENCOUNTERING
/;


# Just a test run for each solution.
my( $top, $indexes_ref );

($top, $indexes_ref ) = find_top_matches_force( \@words );
print "Testing force method: $top matches.\n";
print "@words[@$indexes_ref]\n";

( $top, $indexes_ref ) = find_top_matches_hash( \@words );
print "Testing hash  method: $top matches.\n";
print "@words[@$indexes_ref]\n";



my $count = 20000;
cmpthese( $count, {
    'Hash'  => sub{ find_top_matches_hash( \@words ); },
    'Force' => sub{ find_top_matches_force( \@words ); },
} );


sub find_top_matches_hash {
    my $words = shift;
    my @scores;
    my $columns;
    my $max_col = max( map { length $_ } @{$words} ) - 1;
    foreach my $col_idx ( 0 .. $max_col ) {
        $columns->[$col_idx]{ substr $_, $col_idx, 1 }++ 
            for @{$words};
    }
    foreach my $word ( @{$words} ) {
        my $score = sum( 
            map{ 
                $columns->[$_]{ substr $word, $_, 1 } - 1
            } 0 .. $max_col
        );
        push @scores, $score;
    }
    my $max = max( @scores );
    my ( @max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores;
    return(  $max, \@max_ixs );
}


sub find_top_matches_force {
    my $words = shift;
    my @scores;
    foreach my $word ( @{$words} ) {
        my $score;
        foreach my $comp_word ( @{$words} ) {
            next if $comp_word eq $word;
            foreach my $pos ( 0 .. ( length $word ) - 1 ) {
                $score++ if 
                    substr( $word, $pos, 1 ) eq substr( $comp_word, $pos, 1);
            }
        }
        push @scores, $score;
    }
    my $max = max( @scores );
    my ( @max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores;
    return( $max, \@max_ixs );
}

The output is:

Testing force method: 39 matches.
APPRECIATION
Testing hash  method: 39 matches.
APPRECIATION
        Rate Force  Hash
Force 2358/s    --  -74%
Hash  9132/s  287%    --

I realize your original spec changed after you saw some of the other options provided, and that's sort of the nature of innovation to a degree, but the puzzle was still alive in my mind. As you can see, my hash method is 287% faster than the original method. More fun in less time!

回复收藏 0 原文

西瓜 2024-11-26 01:55:59

作为起点，您可以有效地检查它们有多少个共同字母：

$count = ($word1 ^ $word2) =~ y/\0//;

但这仅在您循环遍历所有可能的单词对时才有用，在这种情况下不需要：

use strict;
use warnings;
my @words = qw/
    BAKER
    SALER
    BALER
    CARER
    RUFFR
/;

# you want a hash to indicate which letters are present how many times in each position:

my %count;
for my $word (@words) {
    my @letters = split //, $word;
    $count{$_}{ $letters[$_] }++ for 0..$#letters;
}

# then for any given word, you get the count for each of its letters minus one (because the word itself is included in the count), and see if it is a maximum (so far) for any position or for the total:

my %max_common_letters_count;
my %max_common_letters_words;
for my $word (@words) {
    my @letters = split //, $word;
    my $total;
    for my $position (0..$#letters, 'total') {
        my $count;
        if ( $position eq 'total' ) {
            $count = $total;
        }
        else {
            $count = $count{$position}{ $letters[$position] } - 1;
            $total += $count;
        }
        if ( ! $max_common_letters_count{$position} || $count >= $max_common_letters_count{$position} ) {
            if ( $max_common_letters_count{$position} && $count == $max_common_letters_count{$position} ) {
                push @{ $max_common_letters_words{$position} }, $word;
            }
            else {
                $max_common_letters_count{$position} = $count;
                $max_common_letters_words{$position} = [ $word ];
            }
        }
    }
}

# then show the maximum words for each position and in total: 

for my $position ( sort { $a <=> $b } grep $_ ne 'total', keys %max_common_letters_count ) {
    printf( "Position %s had a maximum of common letters of %s in words: %s\n",
        $position,
        $max_common_letters_count{$position},
        join(', ', @{ $max_common_letters_words{$position} })
    );
}
printf( "The maximum total common letters was %s in words(s): %s\n",
    $max_common_letters_count{'total'},
    join(', ', @{ $max_common_letters_words{'total'} })
);

As a starting point, you can efficiently check how many letters they have in common with:

$count = ($word1 ^ $word2) =~ y/\0//;

But that's only useful if you loop through all possible pairs of words, something that isn't necessary in this case:

use strict;
use warnings;
my @words = qw/
    BAKER
    SALER
    BALER
    CARER
    RUFFR
/;

# you want a hash to indicate which letters are present how many times in each position:

my %count;
for my $word (@words) {
    my @letters = split //, $word;
    $count{$_}{ $letters[$_] }++ for 0..$#letters;
}

# then for any given word, you get the count for each of its letters minus one (because the word itself is included in the count), and see if it is a maximum (so far) for any position or for the total:

my %max_common_letters_count;
my %max_common_letters_words;
for my $word (@words) {
    my @letters = split //, $word;
    my $total;
    for my $position (0..$#letters, 'total') {
        my $count;
        if ( $position eq 'total' ) {
            $count = $total;
        }
        else {
            $count = $count{$position}{ $letters[$position] } - 1;
            $total += $count;
        }
        if ( ! $max_common_letters_count{$position} || $count >= $max_common_letters_count{$position} ) {
            if ( $max_common_letters_count{$position} && $count == $max_common_letters_count{$position} ) {
                push @{ $max_common_letters_words{$position} }, $word;
            }
            else {
                $max_common_letters_count{$position} = $count;
                $max_common_letters_words{$position} = [ $word ];
            }
        }
    }
}

# then show the maximum words for each position and in total: 

for my $position ( sort { $a <=> $b } grep $_ ne 'total', keys %max_common_letters_count ) {
    printf( "Position %s had a maximum of common letters of %s in words: %s\n",
        $position,
        $max_common_letters_count{$position},
        join(', ', @{ $max_common_letters_words{$position} })
    );
}
printf( "The maximum total common letters was %s in words(s): %s\n",
    $max_common_letters_count{'total'},
    join(', ', @{ $max_common_letters_words{'total'} })
);

回复收藏 0 原文

姜生凉生 2024-11-26 01:55:59

这是一个完整的脚本。它使用了 ysth 提到的相同想法（尽管我独立拥有它）。使用按位异或来组合字符串，然后计算结果中 NUL 的数量。只要您的字符串是 ASCII，它就会告诉您有多少个匹配的字母。（该比较区分大小写，我不确定如果字符串是 UTF-8 会发生什么。可能没什么好处。）

use strict;
use warnings;
use 5.010;

use List::Util qw(max);

sub findMatches
{
  my ($words) = @_;

  # Compare each word to every other word:
  my @matches = (0) x @$words;

  for my $i (0 .. $#$words-1) {
    for my $j ($i+1 .. $#$words) {
      my $m = ($words->[$i] ^ $words->[$j]) =~ tr/\0//;

      $matches[$i] += $m;
      $matches[$j] += $m;
    }
  }

  # Find how many matches in the best word:
  my $max = max(@matches);

  # Find the words with that many matches:
  my @wanted = grep { $matches[$_] == $max } 0 .. $#matches;

  wantarray ? @$words[@wanted] : $words->[$wanted[0]];
} # end findMatches

my @words = qw(
    BAKER
    SALER
    BALER
    CARER
    RUFFR
);

say for findMatches(\@words);

Here's a complete script. It uses the same idea that ysth mentioned (although I had it independently). Use bitwise xor to combine the strings, and then count the number of NULs in the result. As long as your strings are ASCII, that will tell you how many matching letters there were. (That comparison is case sensitive, and I'm not sure what would happen if the strings were UTF-8. Probably nothing good.)

use strict;
use warnings;
use 5.010;

use List::Util qw(max);

sub findMatches
{
  my ($words) = @_;

  # Compare each word to every other word:
  my @matches = (0) x @$words;

  for my $i (0 .. $#$words-1) {
    for my $j ($i+1 .. $#$words) {
      my $m = ($words->[$i] ^ $words->[$j]) =~ tr/\0//;

      $matches[$i] += $m;
      $matches[$j] += $m;
    }
  }

  # Find how many matches in the best word:
  my $max = max(@matches);

  # Find the words with that many matches:
  my @wanted = grep { $matches[$_] == $max } 0 .. $#matches;

  wantarray ? @$words[@wanted] : $words->[$wanted[0]];
} # end findMatches

my @words = qw(
    BAKER
    SALER
    BALER
    CARER
    RUFFR
);

say for findMatches(\@words);

回复收藏 0 原文

债姬 2024-11-26 01:55:59

有一段时间没有接触perl了，所以它是伪代码。这不是最快的算法，但对于少量的单词来说它可以很好地工作。

totals = new map #e.g. an object to map :key => :value

for each word a
  for each word b
    next if a equals b

    totals[a] = 0
    for i from 1 to a.length
      if a[i] == b[i]
        totals[a] += 1
      end
    end
  end
end

return totals.sort_by_key.last

抱歉缺少 perl，但是如果你将其编码到 perl 中，它应该会像魅力一样工作。

关于运行时的快速说明：这将按 number_of_words^2 * length_of_words 的时间运行，因此在 100 个单词的列表中，每个单词长度为 10 个字符，这将在 100,000 个周期内运行，这已经足够了对于大多数应用程序。

Haven't touched perl in a while, so pseudo-code it is. This isn't the fastest algorithm, but it will work fine for a small amount of words.

totals = new map #e.g. an object to map :key => :value

for each word a
  for each word b
    next if a equals b

    totals[a] = 0
    for i from 1 to a.length
      if a[i] == b[i]
        totals[a] += 1
      end
    end
  end
end

return totals.sort_by_key.last

Sorry about the lack of perl, but if you code this into perl, it should work like a charm.

A quick note on run-time: this will run in time number_of_words^2 * length_of_words, so on a list of 100 words, each of length 10 characters, this will run in 100,000 cycles, which is adequate for most applications.

回复收藏 0 原文

苏别ゝ 2024-11-26 01:55:59

这是一个依赖于调换单词来计算相同字符的版本。我使用了您最初比较中的文字，而不是代码。

这应该适用于任何长度的单词和任何长度的列表。输出是：

Word    score
----    -----
BALER   12
SALER   11
BAKER   11
CARER   10
RUFFR   4

代码：

use warnings;
use strict;

my @w = qw(BAKER SALER BALER CARER RUFFR);
my @tword = t_word(@w);

my @score;
push @score, str_count($_) for @tword;
@score = t_score(@score);

my %total;

for (0 .. $#w) {
    $total{$w[$_]} = $score[$_];
}

print "Word\tscore\n";
print "----\t-----\n";
print "$_\t$total{$_}\n" for (sort { $total{$b} <=> $total{$a} } keys %total);

# transpose the words
sub t_word {
    my @w = @_;
    my @tword;
    for my $word (@w) {
        my $i = 0;
        while ($word =~ s/(.)//) {
            $tword[$i++] .= $1;
        }
    }
    return @tword;
}

# turn each character into a count
sub str_count {
    my $str = uc(shift);
    while ( $str =~ /([A-Z])/ ) {
        my $chr = $1;
        my $num = () = $str =~ /$chr/g;
        $num--;
        $str =~ s/$chr/$num /g;
    }
    return $str;
}

# sum up the character counts
# while reversing the transpose
sub t_score {
    my @count = @_;
    my @score;
    for my $num (@count) {
        my $i = 0;
        while( $num =~ s/(\d+) //) {
            $score[$i++] += $1;
        }
    }
    return @score;
}

Here's a version that relies on transposing the words in order to count the identical characters. I used the words from your original comparison, not the code.

This should work with any length words, and any length list. Output is:

Word    score
----    -----
BALER   12
SALER   11
BAKER   11
CARER   10
RUFFR   4

The code:

use warnings;
use strict;

my @w = qw(BAKER SALER BALER CARER RUFFR);
my @tword = t_word(@w);

my @score;
push @score, str_count($_) for @tword;
@score = t_score(@score);

my %total;

for (0 .. $#w) {
    $total{$w[$_]} = $score[$_];
}

print "Word\tscore\n";
print "----\t-----\n";
print "$_\t$total{$_}\n" for (sort { $total{$b} <=> $total{$a} } keys %total);

# transpose the words
sub t_word {
    my @w = @_;
    my @tword;
    for my $word (@w) {
        my $i = 0;
        while ($word =~ s/(.)//) {
            $tword[$i++] .= $1;
        }
    }
    return @tword;
}

# turn each character into a count
sub str_count {
    my $str = uc(shift);
    while ( $str =~ /([A-Z])/ ) {
        my $chr = $1;
        my $num = () = $str =~ /$chr/g;
        $num--;
        $str =~ s/$chr/$num /g;
    }
    return $str;
}

# sum up the character counts
# while reversing the transpose
sub t_score {
    my @count = @_;
    my @score;
    for my $num (@count) {
        my $i = 0;
        while( $num =~ s/(\d+) //) {
            $score[$i++] += $1;
        }
    }
    return @score;
}

回复收藏 0 原文

人间☆小暴躁 2024-11-26 01:55:59

这是我试图给出的答案。如果您需要的话，这还可以让您查看每场比赛。（即 BALER 匹配 BAKER 中的 4 个字符）。编辑：如果单词之间存在联系，它现在会捕获所有匹配项（我将“CAKER”添加到列表中进行测试）。

#! usr/bin/perl

use strict;
use warnings;

my @wordlist = qw( BAKER SALER BALER CARER RUFFR CAKER);

my %wordcomparison;

#foreach word, break it into letters, then compare it against all other words
#break all other words into letters and loop through the letters (both words have same amount), adding to the count of matched characters each time there's a match
foreach my $word (@wordlist) {
    my @letters = split(//, $word);
    foreach my $otherword (@wordlist) {
        my $count;
        next if $otherword eq $word;
        my @otherwordletters = split (//, $otherword);
        foreach my $i (0..$#letters) {
            $count++ if ( $letters[$i] eq $otherwordletters[$i] );
        }
        $wordcomparison{"$word"}{"$otherword"} = $count;
    }
}

# sort (unnecessary) and loop through the keys of the hash (words in your list)
# foreach key, loop through the other words it compares with
#Add a new key: total, and sum up all the matched characters.
foreach my $word (sort keys %wordcomparison) {
    foreach ( sort keys %{ $wordcomparison{$word} }) {
        $wordcomparison{$word}{total} += $wordcomparison{$word}{$_};
    }
}

#Want $word with highest total

my @max_match = (sort { $wordcomparison{$b}{total} <=> $wordcomparison{$a}{total} } keys %wordcomparison );

#This is to get all if there is a tie:
my $maximum = $max_match[0];
foreach (@max_match) {
print "$_\n" if ($wordcomparison{$_}{total} >= $wordcomparison{$maximum}{total} )
}

输出很简单：CAKER BALER 和 BAKER。

哈希 %wordcomparison 如下所示：

'SALER'
        {
          'RUFFR' => 1,
          'BALER' => 4,
          'BAKER' => 3,
          'total' => 11,
          'CARER' => 3
        };

Here is my attempt at an answer. This will also allow you to see each individual match if you need it. (ie. BALER matches 4 characters in BAKER). EDIT: It now catches all matches if there is a tie between words (I added "CAKER" to the list to test).

#! usr/bin/perl

use strict;
use warnings;

my @wordlist = qw( BAKER SALER BALER CARER RUFFR CAKER);

my %wordcomparison;

#foreach word, break it into letters, then compare it against all other words
#break all other words into letters and loop through the letters (both words have same amount), adding to the count of matched characters each time there's a match
foreach my $word (@wordlist) {
    my @letters = split(//, $word);
    foreach my $otherword (@wordlist) {
        my $count;
        next if $otherword eq $word;
        my @otherwordletters = split (//, $otherword);
        foreach my $i (0..$#letters) {
            $count++ if ( $letters[$i] eq $otherwordletters[$i] );
        }
        $wordcomparison{"$word"}{"$otherword"} = $count;
    }
}

# sort (unnecessary) and loop through the keys of the hash (words in your list)
# foreach key, loop through the other words it compares with
#Add a new key: total, and sum up all the matched characters.
foreach my $word (sort keys %wordcomparison) {
    foreach ( sort keys %{ $wordcomparison{$word} }) {
        $wordcomparison{$word}{total} += $wordcomparison{$word}{$_};
    }
}

#Want $word with highest total

my @max_match = (sort { $wordcomparison{$b}{total} <=> $wordcomparison{$a}{total} } keys %wordcomparison );

#This is to get all if there is a tie:
my $maximum = $max_match[0];
foreach (@max_match) {
print "$_\n" if ($wordcomparison{$_}{total} >= $wordcomparison{$maximum}{total} )
}

The output is simply: CAKER BALER and BAKER.

The hash %wordcomparison looks like:

'SALER'
        {
          'RUFFR' => 1,
          'BALER' => 4,
          'BAKER' => 3,
          'total' => 11,
          'CARER' => 3
        };

回复收藏 0 原文

缪败 2024-11-26 01:55:59

您可以执行此操作，如果字母在其位置匹配，则使用肮脏的正则表达式技巧来执行代码，否则则不然，幸运的是，构建正则表达式非常容易：

一个示例正则表达式是：

(?:(C(?{ $c++ }))|.)(?:(A(?{ $c++ }))|.)(?:(R(?{ $c++ }))|.)(?:(E(?{ $c++ }))|.)(?:(R(?{ $c++ }))|.)

这可能会或可能不会很快。

use 5.12.0;
use warnings;
use re 'eval';

my @words = qw(BAKER SALER BALER CARER RUFFR);

my ($best, $count) = ('', 0);
foreach my $word (@words) {
    our $c = 0;
    foreach my $candidate (@words) {
    next if $word eq $candidate;

    my $regex_str = join('', map {"(?:($_(?{ \$c++ }))|.)"} split '', $word);
    my $regex = qr/^$regex_str$/;

    $candidate =~ $regex or die "did not match!";
    }
    say "$word $c";
    if ($c > $count) {
    $best = $word;
    $count = $c;
    }
}

say "Matching: first best: $best";

使用异或技巧会很快，但会假设很多关于您可能遇到的字符范围。 utf-8 会通过多种方式破坏这种情况。

You can do this, using a dirty regex trick to execute code if a letter matches in its place, but not otherwise, thankfully it's quite easy to build the regexes as you go:

An example regular expression is:

(?:(C(?{ $c++ }))|.)(?:(A(?{ $c++ }))|.)(?:(R(?{ $c++ }))|.)(?:(E(?{ $c++ }))|.)(?:(R(?{ $c++ }))|.)

This may or may not be fast.

use 5.12.0;
use warnings;
use re 'eval';

my @words = qw(BAKER SALER BALER CARER RUFFR);

my ($best, $count) = ('', 0);
foreach my $word (@words) {
    our $c = 0;
    foreach my $candidate (@words) {
    next if $word eq $candidate;

    my $regex_str = join('', map {"(?:($_(?{ \$c++ }))|.)"} split '', $word);
    my $regex = qr/^$regex_str$/;

    $candidate =~ $regex or die "did not match!";
    }
    say "$word $c";
    if ($c > $count) {
    $best = $word;
    $count = $c;
    }
}

say "Matching: first best: $best";

Using xor trick will be fast but assumes a lot about the range of characters you might encounter. There are many ways in which utf-8 will break with that case.

回复收藏 0 原文

帅的被狗咬 2024-11-26 01:55:59

非常感谢所有贡献者！您确实向我表明我还有很多东西需要学习，但您也极大地帮助了我找到自己的答案。我只是将其放在这里以供参考和可能的反馈，因为可能有更好的方法。对我来说，这是我自己能找到的最简单、最直接的方法。享受！ :)

#!/usr/bin/perl
use strict;
use warnings; 

# a list of words for testing
my @list = qw( 
BAKER
SALER
BALER
CARER
RUFFR
);

# populate two dimensional array with the list, 
# so we can compare each letter with the other letters on the same row more easily 
my $list_length = @list;
my @words;

for (my $i = 0; $i < $list_length; $i++) {
    my @letters = split(//, $list[$i]);
    my $letters_length = @letters;
    for (my $j = 0; $j < $letters_length; $j++) {
        $words[$i][$j] = $letters[$j];
    }
}
# this gives a two-dimensionla array:
#
# @words = (    ["B", "A", "K", "E", "R"],
#               ["S", "A", "L", "E", "R"],
#               ["B", "A", "L", "E", "R"],
#               ["C", "A", "R", "E", "R"],
#               ["R", "U", "F", "F", "R"],
# );

# now, on to find the word with most letters in common with the other on the same row

# add up the score for each letter in each word
my $word_length = @words;
my @letter_score;
for my $i (0 .. $#words) {
    for my $j (0 .. $#{$words[$i]}) {
        for (my $k = 0; $k < $word_length; $k++) {
            if ($words[$i][$j] eq $words[$k][$j]) {
                $letter_score[$i][$j] += 1; 
            }
        }
        # we only want to add in matches outside the one we're testing, therefore
        $letter_score[$i][$j] -= 1;
    }
}

# sum each score up
my @scores;
for my $i (0 .. $#letter_score ) {
    for my $j (0 .. $#{$letter_score[$i]}) {
        $scores[$i] += $letter_score[$i][$j];
    }
}

# find the highest score
my $max = $scores[0];
foreach my $i (@scores[1 .. $#scores]) {
    if ($i > $max) {
        $max = $i;
    }
}

# and print it all out :D
for my $i (0 .. $#letter_score ) {
    print "$list[$i]: $scores[$i]";
    if ($scores[$i] == $max) {
        print " <- best";
    }   
    print "\n";
}

运行时，脚本会产生以下结果：

BAKER: 11
SALER: 11
BALER: 12 <- best
CARER: 10
RUFFR: 4

Many thanks to all the contributers! You've certainly shown me that I still have a lot to learn, but you have also helped me tremendously in working out my own answer. I'm just putting it here for reference and possible feedback, since there are probably better ways of doing it. To me this was the simplest and most straight forward approach I could find on my own. Enjøy! :)

#!/usr/bin/perl
use strict;
use warnings; 

# a list of words for testing
my @list = qw( 
BAKER
SALER
BALER
CARER
RUFFR
);

# populate two dimensional array with the list, 
# so we can compare each letter with the other letters on the same row more easily 
my $list_length = @list;
my @words;

for (my $i = 0; $i < $list_length; $i++) {
    my @letters = split(//, $list[$i]);
    my $letters_length = @letters;
    for (my $j = 0; $j < $letters_length; $j++) {
        $words[$i][$j] = $letters[$j];
    }
}
# this gives a two-dimensionla array:
#
# @words = (    ["B", "A", "K", "E", "R"],
#               ["S", "A", "L", "E", "R"],
#               ["B", "A", "L", "E", "R"],
#               ["C", "A", "R", "E", "R"],
#               ["R", "U", "F", "F", "R"],
# );

# now, on to find the word with most letters in common with the other on the same row

# add up the score for each letter in each word
my $word_length = @words;
my @letter_score;
for my $i (0 .. $#words) {
    for my $j (0 .. $#{$words[$i]}) {
        for (my $k = 0; $k < $word_length; $k++) {
            if ($words[$i][$j] eq $words[$k][$j]) {
                $letter_score[$i][$j] += 1; 
            }
        }
        # we only want to add in matches outside the one we're testing, therefore
        $letter_score[$i][$j] -= 1;
    }
}

# sum each score up
my @scores;
for my $i (0 .. $#letter_score ) {
    for my $j (0 .. $#{$letter_score[$i]}) {
        $scores[$i] += $letter_score[$i][$j];
    }
}

# find the highest score
my $max = $scores[0];
foreach my $i (@scores[1 .. $#scores]) {
    if ($i > $max) {
        $max = $i;
    }
}

# and print it all out :D
for my $i (0 .. $#letter_score ) {
    print "$list[$i]: $scores[$i]";
    if ($scores[$i] == $max) {
        print " <- best";
    }   
    print "\n";
}

When run, the script yields the following:

BAKER: 11
SALER: 11
BALER: 12 <- best
CARER: 10
RUFFR: 4

回复收藏 0 原文

~没有更多了~

关于作者

不知所踪

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

找出与其他单词有最多共同点的字母的单词

到目前为止我已经尝试过的

程序的目的

What I've tried until now

Purpose of the Program

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

找出与其他单词有最多共同点的字母的单词

到目前为止我已经尝试过的

程序的目的

What I've tried until now

Purpose of the Program

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。