查找字符串数组的公共前缀

发布于 2024-08-03 14:07:37 字数 1057 浏览 7 评论 0原文

我有一个这样的数组:

$sports = array(
'Softball - Counties',
'Softball - Eastern',
'Softball - North Harbour',
'Softball - South',
'Softball - Western'
);

我想找到字符串的最长公共前缀。在这种情况下,它将是'Softball - '

我想我会遵循这个过程

$i = 1;

// loop to the length of the first string
while ($i < strlen($sports[0]) {

  // grab the left most part up to i in length
  $match = substr($sports[0], 0, $i);

  // loop through all the values in array, and compare if they match
  foreach ($sports as $sport) {

     if ($match != substr($sport, 0, $i) {
         // didn't match, return the part that did match
         return substr($sport, 0, $i-1);
     }

  } // foreach

   // increase string length
   $i++;
} // while

// if you got to here, then all of them must be identical

对于我的5行数组来说可能没问题,但是如果我要做几千行数组,会有很多开销,所以我必须用 $i 的起始值进行移动计算,例如 $i = 字符串的一半,如果失败,则 $i/2 直到成功,然后将 $i 加 1 直到成功。这样我们就可以通过最少的比较次数来获得结果。

是否已经有解决此类问题的公式/算法?

I have an array like this:

$sports = array(
'Softball - Counties',
'Softball - Eastern',
'Softball - North Harbour',
'Softball - South',
'Softball - Western'
);

I would like to find the longest common prefix of the string. In this instance, it would be 'Softball - '

I am thinking that I would follow this process

$i = 1;

// loop to the length of the first string
while ($i < strlen($sports[0]) {

  // grab the left most part up to i in length
  $match = substr($sports[0], 0, $i);

  // loop through all the values in array, and compare if they match
  foreach ($sports as $sport) {

     if ($match != substr($sport, 0, $i) {
         // didn't match, return the part that did match
         return substr($sport, 0, $i-1);
     }

  } // foreach

   // increase string length
   $i++;
} // while

// if you got to here, then all of them must be identical

For my 5 lines array that is probably fine, but if I were to do several thousand lines arrays, there would be a lot of overhead, so I would have to be move calculated with my starting values of $i, eg $i = halfway of string, if it fails, then $i/2 until it works, then increment $i by 1 until we succeed. So that we are doing the least number of comparisons to get a result.

Is there a formula/algorithm out already out there for this kind of problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(16

感情废物 2024-08-10 14:07:37

如果您可以对数组进行排序,那么就有一个简单且非常快速的解决方案。

只需将第一项与最后一项进行比较即可。

如果字符串已排序,则所有字符串共有的任何前缀对于已排序的第一个和最后一个字符串也将是通用的。

sort($sport);

$s1 = $sport[0];               // First string
$s2 = $sport[count($sport)-1]; // Last string
$len = min(strlen($s1), strlen($s2));

// While we still have string to compare,
// if the indexed character is the same in both strings,
// increment the index. 
for ($i=0; $i<$len && $s1[$i]==$s2[$i]; $i++); 

$prefix = substr($s1, 0, $i);

If you can sort your array, then there is a simple and very fast solution.

Simply compare the first item to the last one.

If the strings are sorted, any prefix common to all strings will be common to the sorted first and last strings.

sort($sport);

$s1 = $sport[0];               // First string
$s2 = $sport[count($sport)-1]; // Last string
$len = min(strlen($s1), strlen($s2));

// While we still have string to compare,
// if the indexed character is the same in both strings,
// increment the index. 
for ($i=0; $i<$len && $s1[$i]==$s2[$i]; $i++); 

$prefix = substr($s1, 0, $i);
晚风撩人 2024-08-10 14:07:37

我会使用这个:

$prefix = array_shift($array);  // take the first item as initial prefix
$length = strlen($prefix);
// compare the current prefix with the prefix of the same length of the other items
foreach ($array as $item) {
    // check if there is a match; if not, decrease the prefix by one character at a time
    while ($length && substr($item, 0, $length) !== $prefix) {
        $length--;
        $prefix = substr($prefix, 0, -1);
    }
    if (!$length) {
        break;
    }
}

更新  
这是另一个解决方案,迭代地比较字符串的每个第 n 个字符,直到发现不匹配:

$pl = 0; // common prefix length
$n = count($array);
$l = strlen($array[0]);
while ($pl < $l) {
    $c = $array[0][$pl];
    for ($i=1; $i<$n; $i++) {
        if ($array[$i][$pl] !== $c) break 2;
    }
    $pl++;
}
$prefix = substr($array[0], 0, $pl);

这甚至更有效,因为最多只有 numberOfStrings‍·‍commonPrefixLength原子比较。

I would use this:

$prefix = array_shift($array);  // take the first item as initial prefix
$length = strlen($prefix);
// compare the current prefix with the prefix of the same length of the other items
foreach ($array as $item) {
    // check if there is a match; if not, decrease the prefix by one character at a time
    while ($length && substr($item, 0, $length) !== $prefix) {
        $length--;
        $prefix = substr($prefix, 0, -1);
    }
    if (!$length) {
        break;
    }
}

Update
Here’s another solution, iteratively comparing each n-th character of the strings until a mismatch is found:

$pl = 0; // common prefix length
$n = count($array);
$l = strlen($array[0]);
while ($pl < $l) {
    $c = $array[0][$pl];
    for ($i=1; $i<$n; $i++) {
        if ($array[$i][$pl] !== $c) break 2;
    }
    $pl++;
}
$prefix = substr($array[0], 0, $pl);

This is even more efficient as there are only at most numberOfStrings‍·‍commonPrefixLength atomic comparisons.

风追烟花雨 2024-08-10 14:07:37

我在代码中实现了@diogoriba算法,结果如下:

  • 找到前两个字符串的公共前缀,然后将其与从第三个开始的所有后续字符串进行比较,如果没有找到公共字符串,则修剪公共字符串,在以下情况下获胜前缀的共同点多于不同点。
  • 但bumperbox 的原始算法(除了错误修复)获胜,因为字符串的前缀的共同点少于不同点。 详细信息在代码注释中!

我实现的另一个想法:

首先检查数组中最短的字符串,然后使用它进行比较,而不是简单地使用第一个字符串。 在代码中,这是通过自定义编写的函数 arrayStrLenMin() 实现的。

  • 可以显着减少迭代,但函数 arrayStrLenMin() 本身可能会导致(或多或少)迭代。
  • 简单地从数组中第一个字符串的长度开始似乎相当笨拙,但如果 arrayStrLenMin() 需要多次迭代,则可能会有效。

以尽可能少的迭代次数获取数组中字符串的最大公共前缀(PHP)

代码+广泛测试+备注:

function arrayStrLenMin ($arr, $strictMode = false, $forLoop = false) {
    $errArrZeroLength = -1; // Return value for error: Array is empty
    $errOtherType = -2;     // Return value for error: Found other type (than string in array)
    $errStrNone = -3;       // Return value for error: No strings found (in array)

    $arrLength = count($arr);
    if ($arrLength <= 0 ) { return $errArrZeroLength; }
    $cur = 0;

    foreach ($arr as $key => $val) {
        if (is_string($val)) {
            $min = strlen($val);
            $strFirstFound = $key;
            // echo("Key\tLength / Notification / Error\n");
            // echo("$key\tFound first string member at key with length: $min!\n");
            break;
        }
        else if ($strictMode) { return $errOtherType; } // At least 1 type other than string was found.
    }
    if (! isset($min)) { return $errStrNone; } // No string was found in array.

    // SpeedRatio of foreach/for is approximately 2/1 as dicussed at:
    // http://juliusbeckmann.de/blog/php-foreach-vs-while-vs-for-the-loop-battle.html

    // If $strFirstFound is found within the first 1/SpeedRatio (=0.5) of the array, "foreach" is faster!

    if (! $forLoop) {
        foreach ($arr as $key => $val) {
            if (is_string($val)) {
                $cur = strlen($val);
                // echo("$key\t$cur\n");
                if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
                if ($cur < $min) { $min = $cur; }
            }
        // else { echo("$key\tNo string!\n"); }
        }
    }

    // If $strFirstFound is found after the first 1/SpeedRatio (=0.5) of the array, "for" is faster!

    else {
        for ($i = $strFirstFound + 1; $i < $arrLength; $i++) {
            if (is_string($arr[$i])) {
                $cur = strlen($arr[$i]);
                // echo("$i\t$cur\n");
                if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
                if ($cur < $min) { $min = $cur; }
            }
            // else { echo("$i\tNo string!\n"); }
        }
    }

    return $min;
}

function strCommonPrefixByStr($arr, $strFindShortestFirst = false) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    // Determine loop length
    /// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
    if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
    /// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
    else { $end = strlen($arr[0]); }

    for ($i = 1; $i <= $end + 1; $i++) {
        // Grab the part from 0 up to $i
        $commonStrMax = substr($arr[0], 0, $i);
        echo("Match: $i\t$commonStrMax\n");
        // Loop through all the values in array, and compare if they match
        foreach ($arr as $key => $str) {
            echo("  Str: $key\t$str\n");
            // Didn't match, return the part that did match
            if ($commonStrMax != substr($str, 0, $i)) {
                    return substr($commonStrMax, 0, $i-1);
            }
        }
    }
    // Special case: No mismatch (hence no return) happened until loop end!
    return $commonStrMax; // Thus entire first common string is the common prefix!
}

function strCommonPrefixByChar($arr, $strFindShortestFirst = false) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    // Determine loop length
    /// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
    if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
    /// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
    else { $end = strlen($arr[0]); }

    for ($i = 0 ; $i <= $end + 1; $i++) {
        // Grab char $i
        $char = substr($arr[0], $i, 1);
        echo("Match: $i\t"); echo(str_pad($char, $i+1, " ", STR_PAD_LEFT)); echo("\n");
        // Loop through all the values in array, and compare if they match
        foreach ($arr as $key => $str) {
            echo("  Str: $key\t$str\n");
            // Didn't match, return the part that did match
            if ($char != $str[$i]) { // Same functionality as ($char != substr($str, $i, 1)). Same efficiency?
                    return substr($arr[0], 0, $i);
            }
        }
    }
    // Special case: No mismatch (hence no return) happened until loop end!
    return substr($arr[0], 0, $end); // Thus entire first common string is the common prefix!
}


function strCommonPrefixByNeighbour($arr) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    /// Get the common string prefix of the first 2 strings
    $strCommonMax = strCommonPrefixByChar(array($arr[0], $arr[1]));
    if ($strCommonMax === false) { return false; }
    if ($strCommonMax == "") { return ""; }
    $strCommonMaxLength = strlen($strCommonMax);

    /// Now start looping from the 3rd string
    echo("-----\n");
    for ($i = 2; ($i < $arrLength) && ($strCommonMaxLength >= 1); $i++ ) {
        echo("  STR: $i\t{$arr[$i]}\n");

        /// Compare the maximum common string with the next neighbour

        /*
        //// Compare by char: Method unsuitable!

        // Iterate from string end to string beginning
        for ($ii = $strCommonMaxLength - 1; $ii >= 0; $ii--) {
            echo("Match: $ii\t"); echo(str_pad($arr[$i][$ii], $ii+1, " ", STR_PAD_LEFT)); echo("\n");
            // If you find the first mismatch from the end, break.
            if ($arr[$i][$ii] != $strCommonMax[$ii]) {
                $strCommonMaxLength = $ii - 1; break;
                // BUT!!! We may falsely assume that the string from the first mismatch until the begining match! This new string neighbour string is completely "unexplored land", there might be differing chars closer to the beginning. This method is not suitable. Better use string comparison than char comparison.
            }
        }
        */

        //// Compare by string

        for ($ii = $strCommonMaxLength; $ii > 0; $ii--) {
            echo("MATCH: $ii\t$strCommonMax\n");
            if (substr($arr[$i],0,$ii) == $strCommonMax) {
                break;
            }
            else {
                $strCommonMax = substr($strCommonMax,0,$ii - 1);
                $strCommonMaxLength--;
            }
        }
    }
    return substr($arr[0], 0, $strCommonMaxLength);
}





// Tests for finding the common prefix

/// Scenarios

$filesLeastInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);

$filesLessInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);

$filesMoreInCommon = array (
"/Voluuuuuuuuuuuuuumes/1/a/a/1",
"/Voluuuuuuuuuuuuuumes/1/a/a/2",
"/Voluuuuuuuuuuuuuumes/1/a/b/1",
"/Voluuuuuuuuuuuuuumes/1/a/b/2",
"/Voluuuuuuuuuuuuuumes/2/a/b/c/1",
"/Voluuuuuuuuuuuuuumes/2/a/a/1",
);

$sameDir = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
);

$sameFile = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/1",
);

$noCommonPrefix = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
"Net/1/a/a/aaaaa/2",
);

$longestLast = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/aaaaa/2",
);

$longestFirst = array (
"/Volumes/1/a/a/aaaaa/1",
"/Volumes/1/a/a/2",
);

$one = array ("/Volumes/1/a/a/aaaaa/1");

$empty = array ( );


// Test Results for finding  the common prefix

/*

I tested my functions in many possible scenarios.
The results, the common prefixes, were always correct in all scenarios!
Just try a function call with your individual array!

Considering iteration efficiency, I also performed tests:

I put echo functions into the functions where iterations occur, and measured the number of CLI line output via:
php <script with strCommonPrefixByStr or strCommonPrefixByChar> | egrep "^  Str:" | wc -l   GIVES TOTAL ITERATION SUM.
php <Script with strCommonPrefixByNeighbour> | egrep "^  Str:" | wc -l   PLUS   | egrep "^MATCH:" | wc -l   GIVES TOTAL ITERATION SUM.

My hypothesis was proven:
strCommonPrefixByChar wins in situations where the strings have less in common in their beginning (=prefix).
strCommonPrefixByNeighbour wins where there is more in common in the prefixes.

*/

// Test Results Table
// Used Functions | Iteration amount | Remarks

// $result = (strCommonPrefixByStr($filesLessInCommon)); // 35
// $result = (strCommonPrefixByChar($filesLessInCommon)); // 35 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLessInCommon)); // 88 + 42 = 130 // Loses in this category!

// $result = (strCommonPrefixByStr($filesMoreInCommon)); // 137
// $result = (strCommonPrefixByChar($filesMoreInCommon)); // 137 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLeastInCommon)); // 12 + 4 = 16 // Far the winner in this category!

echo("Common prefix of all members:\n");
var_dump($result);





// Tests for finding the shortest string in array

/// Arrays

// $empty = array ();
// $noStrings = array (0,1,2,3.0001,4,false,true,77);
// $stringsOnly = array ("one","two","three","four");
// $mixed = array (0,1,2,3.0001,"four",false,true,"seven", 8888);

/// Scenarios

// I list them from fewest to most iterations, which is not necessarily equivalent to slowest to fastest!
// For speed consider the remarks in the code considering the Speed ratio of foreach/for!

//// Fewest iterations (immediate abort on "Found other type", use "for" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, true, true) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: Found other type!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Result: Found other type!

*/

//// Fewer iterations (immediate abort on "Found other type", use "foreach" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, true, false) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: Found other type!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    0   3
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Result: Found other type!

*/

//// More iterations (No immediate abort on "Found other type", use "for" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, false, true) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: No strings found!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Key Length / Notification / Error
    4   Found first string member at key with length: 4!
    5   No string!
    6   No string!
    7   5
    8   No string!
    Result: 4

*/


//// Most iterations (No immediate abort on "Found other type", use "foreach" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, false, false) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: No strings found!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    0   3
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Key Length / Notification / Error
    4   Found first string member at key with length: 4!
    0   No string!
    1   No string!
    2   No string!
    3   No string!
    4   4
    5   No string!
    6   No string!
    7   5
    8   No string!
    Result: 4

*/

I implemented @diogoriba algorithm into code, with this result:

  • Finding the common prefix of the first two strings, and then comparing this with all following strings starting from the 3rd, and trim the common string if nothing common is found, wins in situations where there is more in common in the prefixes than different.
  • But bumperbox's original algorithm (except the bugfixes) wins where the strings have less in common in their prefix than different. Details in the code comments!

Another idea I implemented:

First check for the shortest string in the array, and use this for comparison rather than simply the first string. In the code, this is implemented with the custom written function arrayStrLenMin().

  • Can bring down iterations dramatically, but the function arrayStrLenMin() may itself cause ( more or less) iterations.
  • Simply starting with the length of first string in array seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.

Get the maximum common prefix of strings in an array with as little iterations as possible (PHP)

Code + Extensive Testing + Remarks:

function arrayStrLenMin ($arr, $strictMode = false, $forLoop = false) {
    $errArrZeroLength = -1; // Return value for error: Array is empty
    $errOtherType = -2;     // Return value for error: Found other type (than string in array)
    $errStrNone = -3;       // Return value for error: No strings found (in array)

    $arrLength = count($arr);
    if ($arrLength <= 0 ) { return $errArrZeroLength; }
    $cur = 0;

    foreach ($arr as $key => $val) {
        if (is_string($val)) {
            $min = strlen($val);
            $strFirstFound = $key;
            // echo("Key\tLength / Notification / Error\n");
            // echo("$key\tFound first string member at key with length: $min!\n");
            break;
        }
        else if ($strictMode) { return $errOtherType; } // At least 1 type other than string was found.
    }
    if (! isset($min)) { return $errStrNone; } // No string was found in array.

    // SpeedRatio of foreach/for is approximately 2/1 as dicussed at:
    // http://juliusbeckmann.de/blog/php-foreach-vs-while-vs-for-the-loop-battle.html

    // If $strFirstFound is found within the first 1/SpeedRatio (=0.5) of the array, "foreach" is faster!

    if (! $forLoop) {
        foreach ($arr as $key => $val) {
            if (is_string($val)) {
                $cur = strlen($val);
                // echo("$key\t$cur\n");
                if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
                if ($cur < $min) { $min = $cur; }
            }
        // else { echo("$key\tNo string!\n"); }
        }
    }

    // If $strFirstFound is found after the first 1/SpeedRatio (=0.5) of the array, "for" is faster!

    else {
        for ($i = $strFirstFound + 1; $i < $arrLength; $i++) {
            if (is_string($arr[$i])) {
                $cur = strlen($arr[$i]);
                // echo("$i\t$cur\n");
                if ($cur == 0) { return $cur; } // 0 is the shortest possible string, so we can abort here.
                if ($cur < $min) { $min = $cur; }
            }
            // else { echo("$i\tNo string!\n"); }
        }
    }

    return $min;
}

function strCommonPrefixByStr($arr, $strFindShortestFirst = false) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    // Determine loop length
    /// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
    if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
    /// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
    else { $end = strlen($arr[0]); }

    for ($i = 1; $i <= $end + 1; $i++) {
        // Grab the part from 0 up to $i
        $commonStrMax = substr($arr[0], 0, $i);
        echo("Match: $i\t$commonStrMax\n");
        // Loop through all the values in array, and compare if they match
        foreach ($arr as $key => $str) {
            echo("  Str: $key\t$str\n");
            // Didn't match, return the part that did match
            if ($commonStrMax != substr($str, 0, $i)) {
                    return substr($commonStrMax, 0, $i-1);
            }
        }
    }
    // Special case: No mismatch (hence no return) happened until loop end!
    return $commonStrMax; // Thus entire first common string is the common prefix!
}

function strCommonPrefixByChar($arr, $strFindShortestFirst = false) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    // Determine loop length
    /// Find shortest string in array: Can bring down iterations dramatically, but the function arrayStrLenMin() itself can cause ( more or less) iterations.
    if ($strFindShortestFirst) { $end = arrayStrLenMin($arr, true); }
    /// Simply start with length of first string in array: Seems quite clumsy, but may turn out effective, if arrayStrLenMin() needs many iterations.
    else { $end = strlen($arr[0]); }

    for ($i = 0 ; $i <= $end + 1; $i++) {
        // Grab char $i
        $char = substr($arr[0], $i, 1);
        echo("Match: $i\t"); echo(str_pad($char, $i+1, " ", STR_PAD_LEFT)); echo("\n");
        // Loop through all the values in array, and compare if they match
        foreach ($arr as $key => $str) {
            echo("  Str: $key\t$str\n");
            // Didn't match, return the part that did match
            if ($char != $str[$i]) { // Same functionality as ($char != substr($str, $i, 1)). Same efficiency?
                    return substr($arr[0], 0, $i);
            }
        }
    }
    // Special case: No mismatch (hence no return) happened until loop end!
    return substr($arr[0], 0, $end); // Thus entire first common string is the common prefix!
}


function strCommonPrefixByNeighbour($arr) {
    $arrLength = count($arr);
    if ($arrLength < 2) { return false; }

    /// Get the common string prefix of the first 2 strings
    $strCommonMax = strCommonPrefixByChar(array($arr[0], $arr[1]));
    if ($strCommonMax === false) { return false; }
    if ($strCommonMax == "") { return ""; }
    $strCommonMaxLength = strlen($strCommonMax);

    /// Now start looping from the 3rd string
    echo("-----\n");
    for ($i = 2; ($i < $arrLength) && ($strCommonMaxLength >= 1); $i++ ) {
        echo("  STR: $i\t{$arr[$i]}\n");

        /// Compare the maximum common string with the next neighbour

        /*
        //// Compare by char: Method unsuitable!

        // Iterate from string end to string beginning
        for ($ii = $strCommonMaxLength - 1; $ii >= 0; $ii--) {
            echo("Match: $ii\t"); echo(str_pad($arr[$i][$ii], $ii+1, " ", STR_PAD_LEFT)); echo("\n");
            // If you find the first mismatch from the end, break.
            if ($arr[$i][$ii] != $strCommonMax[$ii]) {
                $strCommonMaxLength = $ii - 1; break;
                // BUT!!! We may falsely assume that the string from the first mismatch until the begining match! This new string neighbour string is completely "unexplored land", there might be differing chars closer to the beginning. This method is not suitable. Better use string comparison than char comparison.
            }
        }
        */

        //// Compare by string

        for ($ii = $strCommonMaxLength; $ii > 0; $ii--) {
            echo("MATCH: $ii\t$strCommonMax\n");
            if (substr($arr[$i],0,$ii) == $strCommonMax) {
                break;
            }
            else {
                $strCommonMax = substr($strCommonMax,0,$ii - 1);
                $strCommonMaxLength--;
            }
        }
    }
    return substr($arr[0], 0, $strCommonMaxLength);
}





// Tests for finding the common prefix

/// Scenarios

$filesLeastInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);

$filesLessInCommon = array (
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/2",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/1",
"/Vol/1/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/2",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/b/c/1",
"/Vol/2/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/a/1",
);

$filesMoreInCommon = array (
"/Voluuuuuuuuuuuuuumes/1/a/a/1",
"/Voluuuuuuuuuuuuuumes/1/a/a/2",
"/Voluuuuuuuuuuuuuumes/1/a/b/1",
"/Voluuuuuuuuuuuuuumes/1/a/b/2",
"/Voluuuuuuuuuuuuuumes/2/a/b/c/1",
"/Voluuuuuuuuuuuuuumes/2/a/a/1",
);

$sameDir = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
);

$sameFile = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/1",
);

$noCommonPrefix = array (
"/Volumes/1/a/a/",
"/Volumes/1/a/a/aaaaa/2",
"Net/1/a/a/aaaaa/2",
);

$longestLast = array (
"/Volumes/1/a/a/1",
"/Volumes/1/a/a/aaaaa/2",
);

$longestFirst = array (
"/Volumes/1/a/a/aaaaa/1",
"/Volumes/1/a/a/2",
);

$one = array ("/Volumes/1/a/a/aaaaa/1");

$empty = array ( );


// Test Results for finding  the common prefix

/*

I tested my functions in many possible scenarios.
The results, the common prefixes, were always correct in all scenarios!
Just try a function call with your individual array!

Considering iteration efficiency, I also performed tests:

I put echo functions into the functions where iterations occur, and measured the number of CLI line output via:
php <script with strCommonPrefixByStr or strCommonPrefixByChar> | egrep "^  Str:" | wc -l   GIVES TOTAL ITERATION SUM.
php <Script with strCommonPrefixByNeighbour> | egrep "^  Str:" | wc -l   PLUS   | egrep "^MATCH:" | wc -l   GIVES TOTAL ITERATION SUM.

My hypothesis was proven:
strCommonPrefixByChar wins in situations where the strings have less in common in their beginning (=prefix).
strCommonPrefixByNeighbour wins where there is more in common in the prefixes.

*/

// Test Results Table
// Used Functions | Iteration amount | Remarks

// $result = (strCommonPrefixByStr($filesLessInCommon)); // 35
// $result = (strCommonPrefixByChar($filesLessInCommon)); // 35 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLessInCommon)); // 88 + 42 = 130 // Loses in this category!

// $result = (strCommonPrefixByStr($filesMoreInCommon)); // 137
// $result = (strCommonPrefixByChar($filesMoreInCommon)); // 137 // Same amount of iterations, but much fewer characters compared because ByChar instead of ByString!
// $result = (strCommonPrefixByNeighbour($filesLeastInCommon)); // 12 + 4 = 16 // Far the winner in this category!

echo("Common prefix of all members:\n");
var_dump($result);





// Tests for finding the shortest string in array

/// Arrays

// $empty = array ();
// $noStrings = array (0,1,2,3.0001,4,false,true,77);
// $stringsOnly = array ("one","two","three","four");
// $mixed = array (0,1,2,3.0001,"four",false,true,"seven", 8888);

/// Scenarios

// I list them from fewest to most iterations, which is not necessarily equivalent to slowest to fastest!
// For speed consider the remarks in the code considering the Speed ratio of foreach/for!

//// Fewest iterations (immediate abort on "Found other type", use "for" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, true, true) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: Found other type!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Result: Found other type!

*/

//// Fewer iterations (immediate abort on "Found other type", use "foreach" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, true, false) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: Found other type!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    0   3
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Result: Found other type!

*/

//// More iterations (No immediate abort on "Found other type", use "for" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, false, true) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: No strings found!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Key Length / Notification / Error
    4   Found first string member at key with length: 4!
    5   No string!
    6   No string!
    7   5
    8   No string!
    Result: 4

*/


//// Most iterations (No immediate abort on "Found other type", use "foreach" loop)

// foreach( array($empty, $noStrings, $stringsOnly, $mixed) as $arr) {
//  echo("NEW ANALYSIS:\n");
//  echo("Result: " . arrayStrLenMin($arr, false, false) . "\n\n");
// }

/* Results:

    NEW ANALYSIS:
    Result: Array is empty!

    NEW ANALYSIS:
    Result: No strings found!

    NEW ANALYSIS:
    Key Length / Notification / Error
    0   Found first string member at key with length: 3!
    0   3
    1   3
    2   5
    3   4
    Result: 3

    NEW ANALYSIS:
    Key Length / Notification / Error
    4   Found first string member at key with length: 4!
    0   No string!
    1   No string!
    2   No string!
    3   No string!
    4   4
    5   No string!
    6   No string!
    7   5
    8   No string!
    Result: 4

*/
淡水深流 2024-08-10 14:07:37

可能有一些非常受好评的算法,但就在我的脑海中,如果你知道你的共性将在左侧,就像你的例子一样,你可以做得比你发布的方法更好首先找到前两个字符串的共性,然后向下迭代列表的其余部分,根据需要修剪公共字符串以实现共性,或者如果一直修剪到无,则以失败终止。

Probably there is some terribly well-regarded algorithm for this, but just off the top of my head, if you know your commonality is going to be on the left-hand side like in your example, you could do way better than your posted methodology by first finding the commonality of the first two strings, and then iterating down the rest of the list, trimming the common string as necessary to achieve commonality or terminating with failure if you trim all the way to nothing.

时光无声 2024-08-10 14:07:37

我认为你走在正确的道路上。 而不是在所有字符串都通过时递增 i。

但是,您可以这样做: 1) 比较数组中的前 2 个字符串并找出它们有多少个公共字符, 例如,将常见字符保存在名为 maxCommon 的单独字符串中。

2) 将第三个字符串与 maxCommon 进行比较。如果公共字符的数量较少,则将 maxCommon 修剪为匹配的字符。

3) 重复并冲洗阵列的其余部分。在该过程结束时,maxCommon 将具有所有数组元素所共有的字符串。

这会增加一些开销,因为您需要将每个字符串与 maxCommon 进行比较,但会大大减少获得结果所需的迭代次数。

I think you're on the right way. But instead of incrementing i when all of the string passes, you could do this:

1) Compare the first 2 strings in the array and find out how many common characters they have. Save the common characters in a separate string called maxCommon, for example.

2) Compare the third string w/ maxCommon. If the number of common characters is smaller, trim maxCommon to the characters that match.

3) Repeat and rinse for the rest of the array. At the end of the process, maxCommon will have the string that is common to all of the array elements.

This will add some overhead because you'll need to compare each string w/ maxCommon, but will drastically reduce the number of iterations you'll need to get your results.

淡写薰衣草的香 2024-08-10 14:07:37

我认为“公共部分”的意思是“最长的公共前缀”。这比任何常见的子串都更容易计算。

在最坏的情况下,如果不读取 (n+1) * m 个字符,在最好的情况下读取 n * m + 1 字符,则无法完成此操作,其中 n code> 是最长公共前缀的长度,m 是字符串的数量。

一次比较一个字母即可实现这种效率(Big Theta (n * m))。

您提出的算法在 Big Theta(n^2 * m) 中运行,对于大输入来说,这要慢得多。

第三种提出的算法是找到前两个字符串的最长前缀,然后与第三个、第四个等进行比较,其运行时间也在 Big Theta(n * m) 中,但具有更高的常数因子。在实践中它可能只会稍微慢一些。

总的来说,我建议只滚动你自己的函数,因为第一个算法太慢,而另外两个算法编写起来也同样复杂。

查看 WikiPedia 了解 Big Theta 表示法的说明。

I assume that by "common part" you mean "longest common prefix". That is a much simpler to compute than any common substring.

This cannot be done without reading (n+1) * m characters in the worst case and n * m + 1 in the best case, where n is the length of the longest common prefix and m is the number of strings.

Comparing one letter at a time achieves that efficiency (Big Theta (n * m)).

Your proposed algorithm runs in Big Theta(n^2 * m), which is much, much slower for large inputs.

The third proposed algorithm of finding the longest prefix of the first two strings, then comparing that with the third, fourth, etc. also has a running time in Big Theta(n * m), but with a higher constant factor. It will probably only be slightly slower in practice.

Overall, I would recommend just rolling your own function, since the first algorithm is too slow and the two others will be about equally complicated to write anyway.

Check out WikiPedia for a description of Big Theta notation.

早乙女 2024-08-10 14:07:37
  1. 不知道

  2. 是的:您可以简单地检查第 i 个字符(您已经知道字符 0 到 i-),而不是比较从 0 到长度 i 的子字符串1 匹配)。

  1. not that I know of

  2. yes: instead of comparing the substring from 0 to length i, you can simply check the ith character (you already know that characters 0 to i-1 match).

七度光 2024-08-10 14:07:37

这是 JavaScript 中的一个优雅的递归实现:

function prefix(strings) {
    switch (strings.length) {

      case 0:
        return "";

      case 1:
        return strings[0];

      case 2:
        // compute the prefix between the two strings
        var a = strings[0],
            b = strings[1],
            n = Math.min(a.length, b.length),
            i = 0;
        while (i < n && a.charAt(i) === b.charAt(i))
            ++i;
        return a.substring(0, i);

      default:
        // return the common prefix of the first string,
        // and the common prefix of the rest of the strings
        return prefix([ strings[0], prefix(strings.slice(1)) ]);
    }
}

Here's an elegant, recursive implementation in JavaScript:

function prefix(strings) {
    switch (strings.length) {

      case 0:
        return "";

      case 1:
        return strings[0];

      case 2:
        // compute the prefix between the two strings
        var a = strings[0],
            b = strings[1],
            n = Math.min(a.length, b.length),
            i = 0;
        while (i < n && a.charAt(i) === b.charAt(i))
            ++i;
        return a.substring(0, i);

      default:
        // return the common prefix of the first string,
        // and the common prefix of the rest of the strings
        return prefix([ strings[0], prefix(strings.slice(1)) ]);
    }
}
英雄似剑 2024-08-10 14:07:37

简短而甜蜜的版本,也许不是最有效的:

/// Return length of longest common prefix in an array of strings.
function _commonPrefix($array) {
    if(count($array) < 2) {
        if(count($array) == 0)
            return false; // empty array: undefined prefix
        else
            return strlen($array[0]); // 1 element: trivial case
    }
    $len = max(array_map('strlen',$array)); // initial upper limit: max length of all strings.
    $prevval = reset($array);
    while(($newval = next($array)) !== FALSE) {
        for($j = 0 ; $j < $len ; $j += 1)
            if($newval[$j] != $prevval[$j])
                $len = $j;
        $prevval = $newval;
    }
    return $len;
}

// TEST CASE:
$arr = array('/var/yam/yamyam/','/var/yam/bloorg','/var/yar/sdoo');
print_r($arr);
$plen = _commonprefix($arr);
$pstr = substr($arr[0],0,$plen);
echo "Res: $plen\n";
echo "==> ".$pstr."\n";
echo "dir: ".dirname($pstr.'aaaa')."\n";

测试用例的输出:

Array
(
    [0] => /var/yam/yamyam/
    [1] => /var/yam/bloorg
    [2] => /var/yar/sdoo
)
Res: 7
==> /var/ya
dir: /var

Short and sweet version, perhaps not the most efficient:

/// Return length of longest common prefix in an array of strings.
function _commonPrefix($array) {
    if(count($array) < 2) {
        if(count($array) == 0)
            return false; // empty array: undefined prefix
        else
            return strlen($array[0]); // 1 element: trivial case
    }
    $len = max(array_map('strlen',$array)); // initial upper limit: max length of all strings.
    $prevval = reset($array);
    while(($newval = next($array)) !== FALSE) {
        for($j = 0 ; $j < $len ; $j += 1)
            if($newval[$j] != $prevval[$j])
                $len = $j;
        $prevval = $newval;
    }
    return $len;
}

// TEST CASE:
$arr = array('/var/yam/yamyam/','/var/yam/bloorg','/var/yar/sdoo');
print_r($arr);
$plen = _commonprefix($arr);
$pstr = substr($arr[0],0,$plen);
echo "Res: $plen\n";
echo "==> ".$pstr."\n";
echo "dir: ".dirname($pstr.'aaaa')."\n";

Output of the test case:

Array
(
    [0] => /var/yam/yamyam/
    [1] => /var/yam/bloorg
    [2] => /var/yar/sdoo
)
Res: 7
==> /var/ya
dir: /var
痕至 2024-08-10 14:07:37

@bumperbox

  1. 您的基本代码需要进行一些修正才能在所有场景中工作!

    • 您的循环仅比较最后一个字符之前的一个字符!
    • 不匹配可能发生在最新公共字符之后的 1 个循环周期。
    • 因此,您必须至少检查第一个字符串的最后一个字符之后的 1 个字符。
    • 因此,您的比较运算符必须为“<= 1”或“< 2”。
  2. 目前您的算法失败

    • 如果第一个字符串完全包含在所有其他字符串中,
    • 或完全包含在除最后一个字符之外的所有其他字符串中。

在我的下一个答案/帖子中,我将附上迭代优化的代码!

原始 Bumperbox 代码加上更正 (PHP):

function shortest($sports) {
 $i = 1;

 // loop to the length of the first string
 while ($i < strlen($sports[0])) {

  // grab the left most part up to i in length
  // REMARK: Culturally biased towards LTR writing systems. Better say: Grab frombeginning...
  $match = substr($sports[0], 0, $i);

  // loop through all the values in array, and compare if they match
  foreach ($sports as $sport) {
   if ($match != substr($sport, 0, $i)) {
    // didn't match, return the part that did match
    return substr($sport, 0, $i-1);
   }
  }
 $i++; // increase string length
 }
}

function shortestCorrect($sports) {
 $i = 1;
 while ($i <= strlen($sports[0]) + 1) {
  // Grab the string from its beginning with length $i
  $match = substr($sports[0], 0, $i);
  foreach ($sports as $sport) {
   if ($match != substr($sport, 0, $i)) {
    return substr($sport, 0, $i-1);
   }
  }
  $i++;
 }
 // Special case: No mismatch happened until loop end! Thus entire str1 is common prefix!
 return $sports[0];
}

$sports1 = array(
'Softball',
'Softball - Eastern',
'Softball - North Harbour');

$sports2 = array(
'Softball - Wester',
'Softball - Western',
);

$sports3 = array(
'Softball - Western',
'Softball - Western',
);

$sports4 = array(
'Softball - Westerner',
'Softball - Western',
);

echo("Output of the original function:\n"); // Failure scenarios

var_dump(shortest($sports1)); // NULL rather than the correct 'Softball'
var_dump(shortest($sports2)); // NULL rather than the correct 'Softball - Wester'
var_dump(shortest($sports3)); // NULL rather than the correct 'Softball - Western'
var_dump(shortest($sports4)); // Only works if the second string is at least one character longer!

echo("\nOutput of the corrected function:\n"); // All scenarios work
var_dump(shortestCorrect($sports1));
var_dump(shortestCorrect($sports2));
var_dump(shortestCorrect($sports3));
var_dump(shortestCorrect($sports4));

@bumperbox

  1. Your basic code needed some correction to work in ALL scenarios!

    • Your loop only compares until one character before the last character!
    • The mismatch can possibly occur 1 loop cycle after the latest common character.
    • Hence you have to at least check until 1 character after your first string's last character.
    • Hence your comparison operator must be "<= 1" or "< 2".
  2. Currently your algorithm fails

    • if the first string is completely included in all other strings,
    • or completely included in all other strings except the last character.

In my next answer/post, I will attach iteration optimized code!

Original Bumperbox code PLUS correction (PHP):

function shortest($sports) {
 $i = 1;

 // loop to the length of the first string
 while ($i < strlen($sports[0])) {

  // grab the left most part up to i in length
  // REMARK: Culturally biased towards LTR writing systems. Better say: Grab frombeginning...
  $match = substr($sports[0], 0, $i);

  // loop through all the values in array, and compare if they match
  foreach ($sports as $sport) {
   if ($match != substr($sport, 0, $i)) {
    // didn't match, return the part that did match
    return substr($sport, 0, $i-1);
   }
  }
 $i++; // increase string length
 }
}

function shortestCorrect($sports) {
 $i = 1;
 while ($i <= strlen($sports[0]) + 1) {
  // Grab the string from its beginning with length $i
  $match = substr($sports[0], 0, $i);
  foreach ($sports as $sport) {
   if ($match != substr($sport, 0, $i)) {
    return substr($sport, 0, $i-1);
   }
  }
  $i++;
 }
 // Special case: No mismatch happened until loop end! Thus entire str1 is common prefix!
 return $sports[0];
}

$sports1 = array(
'Softball',
'Softball - Eastern',
'Softball - North Harbour');

$sports2 = array(
'Softball - Wester',
'Softball - Western',
);

$sports3 = array(
'Softball - Western',
'Softball - Western',
);

$sports4 = array(
'Softball - Westerner',
'Softball - Western',
);

echo("Output of the original function:\n"); // Failure scenarios

var_dump(shortest($sports1)); // NULL rather than the correct 'Softball'
var_dump(shortest($sports2)); // NULL rather than the correct 'Softball - Wester'
var_dump(shortest($sports3)); // NULL rather than the correct 'Softball - Western'
var_dump(shortest($sports4)); // Only works if the second string is at least one character longer!

echo("\nOutput of the corrected function:\n"); // All scenarios work
var_dump(shortestCorrect($sports1));
var_dump(shortestCorrect($sports2));
var_dump(shortestCorrect($sports3));
var_dump(shortestCorrect($sports4));
甜`诱少女 2024-08-10 14:07:37

我会使用这样的递归算法:

1 - 获取数组中的第一个字符串
2 - 使用第一个字符串作为参数调用递归前缀方法
3 - 如果前缀为空则返回无前缀
4 - 循环遍历数组中的所有字符串
4.1 - 如果任何字符串不以前缀开头
4.1.1 - 使用前缀 - 1 作为参数调用递归前缀方法
4.2 返回前缀

I would use a recursive algorithm like this:

1 - get the first string in the array
2 - call the recursive prefix method with the first string as a param
3 - if prefix is empty return no prefix
4 - loop through all the strings in the array
4.1 - if any of the strings does not start with the prefix
4.1.1 - call recursive prefix method with prefix - 1 as a param
4.2 return prefix

盛夏已如深秋| 2024-08-10 14:07:37

    // Common prefix
    $common = '';

    $sports = array(
    'Softball T - Counties',
    'Softball T - Eastern',
    'Softball T - North Harbour',
    'Softball T - South',
    'Softball T - Western'
    );

    // find mini string
    $minLen = strlen($sports[0]);
    foreach ($sports as $s){
        if($minLen > strlen($s))
            $minLen = strlen($s);
    }


    // flag to break out of inner loop
    $flag = false;

    // The possible common string length does not exceed the minimum string length.
    // The following solution is O(n^2), this can be improve.
    for ($i = 0 ; $i < $minLen; $i++){
        $tmp = $sports[0][$i];

        foreach ($sports as $s){
            if($s[$i] != $tmp)
                $flag = true;
        }
        if($flag)
            break;
        else
            $common .= $sports[0][$i];
    }

    print $common;

    // Common prefix
    $common = '';

    $sports = array(
    'Softball T - Counties',
    'Softball T - Eastern',
    'Softball T - North Harbour',
    'Softball T - South',
    'Softball T - Western'
    );

    // find mini string
    $minLen = strlen($sports[0]);
    foreach ($sports as $s){
        if($minLen > strlen($s))
            $minLen = strlen($s);
    }


    // flag to break out of inner loop
    $flag = false;

    // The possible common string length does not exceed the minimum string length.
    // The following solution is O(n^2), this can be improve.
    for ($i = 0 ; $i < $minLen; $i++){
        $tmp = $sports[0][$i];

        foreach ($sports as $s){
            if($s[$i] != $tmp)
                $flag = true;
        }
        if($flag)
            break;
        else
            $common .= $sports[0][$i];
    }

    print $common;
没有伤那来痛 2024-08-10 14:07:37

最上面的答案似乎有点长,所以这里有一个运行时间为 O(n2) 的简洁解决方案。

function findLongestPrefix($arr) {
  return array_reduce($arr, function($prefix, $item) {
    $length = min(strlen($prefix), strlen($item));
    while (substr($prefix, 0, $length) !== substr($item, 0, $length)) {
      $length--;
    }
    return substr($prefix, 0, $length);
  }, $arr[0]);
}

print findLongestPrefix($sports); // Softball -

The top answer seemed a bit long, so here's a concise solution with a runtime of O(n2).

function findLongestPrefix($arr) {
  return array_reduce($arr, function($prefix, $item) {
    $length = min(strlen($prefix), strlen($item));
    while (substr($prefix, 0, $length) !== substr($item, 0, $length)) {
      $length--;
    }
    return substr($prefix, 0, $length);
  }, $arr[0]);
}

print findLongestPrefix($sports); // Softball -
宫墨修音 2024-08-10 14:07:37

无论如何,这是我想出的另一种选择。

我用它来查找产品代码列表的公共前缀(即,有多个产品 SKU 开头有一系列公共字符):

/**
 * Try to find a common prefix for a list of strings
 * 
 * @param array $strings
 * @return string
 */
function findCommonPrefix(array $strings)
{
    $prefix = '';
    $chars = array_map("str_split", $strings);
    $matches = call_user_func_array("array_intersect_assoc", $chars);
    if ($matches) {
        $i = 0;
        foreach ($matches as $key => $value) {
            if ($key != $i) {
                unset($matches[$key]);
            }
            $i++;
        }
        $prefix = join('', $matches);
    }

    return $prefix;
}

For what it's worth, here's another alternative I came up with.

I used this for finding the common prefix for a list of products codes (ie. where there are multiple product SKUs that have a common series of characters at the start):

/**
 * Try to find a common prefix for a list of strings
 * 
 * @param array $strings
 * @return string
 */
function findCommonPrefix(array $strings)
{
    $prefix = '';
    $chars = array_map("str_split", $strings);
    $matches = call_user_func_array("array_intersect_assoc", $chars);
    if ($matches) {
        $i = 0;
        foreach ($matches as $key => $value) {
            if ($key != $i) {
                unset($matches[$key]);
            }
            $i++;
        }
        $prefix = join('', $matches);
    }

    return $prefix;
}
一紙繁鸢 2024-08-10 14:07:37

这是对@Gumbo 答案的补充。如果您想确保所选的公共前缀不会断词,请使用此前缀。我只是让它在所选字符串末尾查找空格。如果存在,我们知道所有短语都有更多内容,因此我们将其截断。

function product_name_intersection($array){

    $pl = 0; // common prefix length
    $n = count($array);
    $l = strlen($array[0]);
    $first = current($array);

    while ($pl < $l) {
        $c = $array[0][$pl];
        for ($i=1; $i<$n; $i++) {
            if (!isset($array[$i][$pl]) || $array[$i][$pl] !== $c) break 2;
        }
        $pl++;
    }
    $prefix = substr($array[0], 0, $pl);

    if ($pl < strlen($first) && substr($prefix, -1, 1) != ' ') {

        $prefix = preg_replace('/\W\w+\s*(\W*)$/', '$1', $prefix);
    }

    $prefix =  preg_replace('/^\W*(.+?)\W*$/', '$1', $prefix);

    return $prefix;
}

This is an addition to the @Gumbo answer. If you want to ensure that the chosen, common prefix does not break words, use this. I am just having it look for a blank space at the end of the chosen string. If that exists we know that there was more to all of the phrases, so we truncate it.

function product_name_intersection($array){

    $pl = 0; // common prefix length
    $n = count($array);
    $l = strlen($array[0]);
    $first = current($array);

    while ($pl < $l) {
        $c = $array[0][$pl];
        for ($i=1; $i<$n; $i++) {
            if (!isset($array[$i][$pl]) || $array[$i][$pl] !== $c) break 2;
        }
        $pl++;
    }
    $prefix = substr($array[0], 0, $pl);

    if ($pl < strlen($first) && substr($prefix, -1, 1) != ' ') {

        $prefix = preg_replace('/\W\w+\s*(\W*)$/', '$1', $prefix);
    }

    $prefix =  preg_replace('/^\W*(.+?)\W*$/', '$1', $prefix);

    return $prefix;
}
寂寞花火° 2024-08-10 14:07:37

(“如果我要做几千行数组”)

这个数据结构是 Trie(前缀树):一种树状数据结构,用于存储动态字符串集,其中每个节点代表单个字符。

  • 时间效率:搜索、插入和插入删除:时间复杂度为O(m),其中m是字符串的长度。

  • 空间效率:存储字符串中存在的唯一字符。与存储整个字符串相比,这可能具有空间优势。

--

输入的 Trie 可视化(绿色节点标记为:endOfString):

在此处输入图像描述

--

PHP 实现:

<?php

class TrieNode 
{
    public $childNode = []; // Associative array to store child nodes
    public $endOfString = false; // Flag to indicate end of a string
}

class Trie 
{
    private $root;

    public function __construct() 
    {
        $this->root = new TrieNode();
    }

    public function insert($string) 
    {
        if (!empty($string)) {
            $this->insertRecursive($this->root, $string);
        }
    }

    private function insertRecursive(&$node, $string) 
    {
        if (empty($string)) {
            $node->endOfString = true;
            return;
        }

        $firstChar = $string[0];
        $remainingString = substr($string, 1);

        if (!isset($node->childNode[$firstChar])) {
            $node->childNode[$firstChar] = new TrieNode();
        }

        $this->insertRecursive($node->childNode[$firstChar], $remainingString);
    }

    public function commonPrefix() 
    {
        $commonPrefix = '';
        $this->commonPrefixRecursive($this->root, $commonPrefix);
        return $commonPrefix;
    }

    private function commonPrefixRecursive($node, &$commonPrefix) 
    {
        if (count($node->childNode) !== 1 || $node->endOfString) {
            return;
        }

        $firstChar = array_key_first($node->childNode);
        $commonPrefix .= $firstChar;
        $this->commonPrefixRecursive($node->childNode[$firstChar], $commonPrefix);
    }
}

// Example usage
$trie = new Trie();
$trie->insert("Softball - Counties");
$trie->insert("Softball - Eastern");
$trie->insert("Softball - North Harbour");
$trie->insert("Softball - South");
$trie->insert("Softball - Western");

echo "Common prefix: " . $trie->commonPrefix() . PHP_EOL;

?>

输出:

Common prefix: Softball - 

演示

("if I were to do several thousand lines arrays")

The data structure for this is a Trie (Prefix Tree): A tree-like data structure used to store a dynamic set of strings where each node represents a single character.

  • Time Efficiency: Search, Insertion & Deletion: With a time-complexity of O(m) where m is the length of the string.

  • Space Efficiency: Store the unique characters present in the strings. This can be a space advantage compared to storing the entire strings.

--

Trie Visualization for your input (Green nodes are marked: endOfString):

enter image description here

--

PHP implementation:

<?php

class TrieNode 
{
    public $childNode = []; // Associative array to store child nodes
    public $endOfString = false; // Flag to indicate end of a string
}

class Trie 
{
    private $root;

    public function __construct() 
    {
        $this->root = new TrieNode();
    }

    public function insert($string) 
    {
        if (!empty($string)) {
            $this->insertRecursive($this->root, $string);
        }
    }

    private function insertRecursive(&$node, $string) 
    {
        if (empty($string)) {
            $node->endOfString = true;
            return;
        }

        $firstChar = $string[0];
        $remainingString = substr($string, 1);

        if (!isset($node->childNode[$firstChar])) {
            $node->childNode[$firstChar] = new TrieNode();
        }

        $this->insertRecursive($node->childNode[$firstChar], $remainingString);
    }

    public function commonPrefix() 
    {
        $commonPrefix = '';
        $this->commonPrefixRecursive($this->root, $commonPrefix);
        return $commonPrefix;
    }

    private function commonPrefixRecursive($node, &$commonPrefix) 
    {
        if (count($node->childNode) !== 1 || $node->endOfString) {
            return;
        }

        $firstChar = array_key_first($node->childNode);
        $commonPrefix .= $firstChar;
        $this->commonPrefixRecursive($node->childNode[$firstChar], $commonPrefix);
    }
}

// Example usage
$trie = new Trie();
$trie->insert("Softball - Counties");
$trie->insert("Softball - Eastern");
$trie->insert("Softball - North Harbour");
$trie->insert("Softball - South");
$trie->insert("Softball - Western");

echo "Common prefix: " . $trie->commonPrefix() . PHP_EOL;

?>

Output:

Common prefix: Softball - 

Demo

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文