在 PHP 中查找两个字符串的匹配部分

发布于 2024-10-04 15:39:40 字数 470 浏览 0 评论 0原文

我正在寻找一种简单的方法来查找 PHP 中两个字符串的匹配部分(特别是在 URI 的上下文中)

例如,考虑两个字符串:

http://2.2.2.2/~machinehost/deployment_folder/

/~machinehost/deployment_folder/users/bob/settings

我需要的是砍掉这些的匹配部分第二个字符串中的两个字符串,结果是:

users/bob/settings,

然后附加第一个字符串作为前缀,形成绝对 URI。

有没有一些简单的方法(在 PHP 中)来比较两个任意字符串以匹配其中的子字符串?

编辑:正如所指出的,我的意思是两个字符串共有的最长匹配子字符串

I'm looking for a simple way to find matching portions of two strings in PHP (specifically in the context of a URI)

For example, consider the two strings:

http://2.2.2.2/~machinehost/deployment_folder/

and

/~machinehost/deployment_folder/users/bob/settings

What I need is to chop off the matching portion of these two strings from the second string, resulting in:

users/bob/settings

before appending the first string as a prefix, forming an absolute URI.

Is there some simple way (in PHP) to compare two arbitrary strings for matching substrings within them?

EDIT: as pointed out, I meant the longest matching substring common to both strings

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

沫尐诺 2024-10-11 15:39:40

假设您的字符串分别为 $a$b,您可以使用以下内容:

$a = 'http://2.2.2.2/~machinehost/deployment_folder/';
$b = '/~machinehost/deployment_folder/users/bob/settings';

$len_a = strlen($a);
$len_b = strlen($b);

for ($p = max(0, $len_a - $len_b); $p < $len_b; $p++)
    if (substr($a, $len_a - ($len_b - $p)) == substr($b, 0, $len_b - $p))
        break;

$result = $a.substr($b, $len_b - $p);

echo $result;

此结果为 http://2.2.2.2/~machinehost/deployment_folder/用户/鲍勃/设置

Assuming your strings are $a and $b, respectively, you can use this:

$a = 'http://2.2.2.2/~machinehost/deployment_folder/';
$b = '/~machinehost/deployment_folder/users/bob/settings';

$len_a = strlen($a);
$len_b = strlen($b);

for ($p = max(0, $len_a - $len_b); $p < $len_b; $p++)
    if (substr($a, $len_a - ($len_b - $p)) == substr($b, 0, $len_b - $p))
        break;

$result = $a.substr($b, $len_b - $p);

echo $result;

This result is http://2.2.2.2/~machinehost/deployment_folder/users/bob/settings.

青瓷清茶倾城歌 2024-10-11 15:39:40

查找最长的公共匹配也可以使用正则表达式来完成。

下面的函数将采用两个字符串,使用一个字符串创建正则表达式,然后针对另一个字符串执行它。

/**
 * Determine the longest common match within two strings
 *
 * @param string $str1
 * @param string $str2 Two strings in any order.
 * @param boolean $case_sensitive Set to true to force
 * case sensitivity. Default: false (case insensitive).
 * @return string The longest string - first match.
 */
function get_longest_common_subsequence( $str1, $str2, $case_sensitive = false ) {
    // First check to see if one string is the same as the other.
    if ( $str1 === $str2 ) return $str1;
    if ( ! $case_sensitive && strtolower( $str1 ) === strtolower( $str2 ) ) return $str1;

    // We'll use '#' as our regex delimiter. Any character can be used as we'll quote the string anyway,
    $delimiter = '#';

    // We'll find the shortest string and use that to check substrings and create our regex.
    $l1 = strlen( $str1 );
    $l2 = strlen( $str2 );
    $str = $l1 <= $l2 ? $str1 : $str2;
    $str2 = $l1 <= $l2 ? $str2 : $str1;
    $l = min( $l1, $l2 );

    // Next check to see if one string is a substring of the other.
    if ( $case_sensitive ) {
        if ( strpos( $str2, $str ) !== false ) {
            return $str;
        }
    }
    else {
        if ( stripos( $str2, $str ) !== false ) {
            return $str;
        }
    }

    // Regex for each character will be of the format (?:a(?=b))?
    // We also need to capture the last character, but this prevents us from matching strings with a single character. (?:.|c)?
    $reg = $delimiter;
    for ( $i = 0; $i < $l; $i++ ) {
        $a = preg_quote( $str[ $i ], $delimiter );
        $b = $i + 1 < $l ? preg_quote( $str[ $i + 1 ], $delimiter ) : false;
        $reg .= sprintf( $b !== false ? '(?:%s(?=%s))?' : '(?:.|%s)?', $a, $b );
    }
    $reg .= $delimiter;
    if ( ! $case_sensitive ) {
        $reg .= 'i';
    }
    // Resulting example regex from a string 'abbc':
    // '#(?:a(?=b))?(?:b(?=b))?(?:b(?=c))?(?:.|c)?#i';

    // Perform our regex on the remaining string
    $str = $l1 <= $l2 ? $str2 : $str1;
    if ( preg_match_all( $reg, $str, $matches ) ) {
        // $matches is an array with a single array with all the matches.
        return array_reduce( $matches[0], function( $a, $b ) {
            $al = strlen( $a );
            $bl = strlen( $b );
            // Return the longest string, as long as it's not a single character.
            return $al >= $bl || $bl <= 1 ? $a : $b;
        }, '' );
    }

    // No match - Return an empty string.
    return '';
}

它将使用两个字符串中较短的一个来生成正则表达式,尽管两种方式的性能很可能是相同的。它可能会错误地将字符串与重复出现的子字符串进行匹配,并且我们仅限于匹配两个或多个字符的字符串,除非它们相等或者一个是另一个字符的子字符串。例如:

// Works as intended.
get_longest_common_subsequence( 'abbc', 'abc' ) === 'ab';

// Returns incorrect substring based on string length and recurring substrings.
get_longest_common_subsequence( 'abbc', 'abcdef' ) === 'abc';

// Does not return any matches, as all recurring strings are only a single character long.
get_longest_common_subsequence( 'abc', 'ace' ) === '';

// One of the strings is a substring of the other.
get_longest_common_subsequence( 'abc', 'a' ) === 'a';

无论如何,它使用替代方法运行,并且可以改进正则表达式以解决其他情况。

Finding the longest common match can also be done using regex.

The below function will take two strings, use one to create a regex, and execute it against the other.

/**
 * Determine the longest common match within two strings
 *
 * @param string $str1
 * @param string $str2 Two strings in any order.
 * @param boolean $case_sensitive Set to true to force
 * case sensitivity. Default: false (case insensitive).
 * @return string The longest string - first match.
 */
function get_longest_common_subsequence( $str1, $str2, $case_sensitive = false ) {
    // First check to see if one string is the same as the other.
    if ( $str1 === $str2 ) return $str1;
    if ( ! $case_sensitive && strtolower( $str1 ) === strtolower( $str2 ) ) return $str1;

    // We'll use '#' as our regex delimiter. Any character can be used as we'll quote the string anyway,
    $delimiter = '#';

    // We'll find the shortest string and use that to check substrings and create our regex.
    $l1 = strlen( $str1 );
    $l2 = strlen( $str2 );
    $str = $l1 <= $l2 ? $str1 : $str2;
    $str2 = $l1 <= $l2 ? $str2 : $str1;
    $l = min( $l1, $l2 );

    // Next check to see if one string is a substring of the other.
    if ( $case_sensitive ) {
        if ( strpos( $str2, $str ) !== false ) {
            return $str;
        }
    }
    else {
        if ( stripos( $str2, $str ) !== false ) {
            return $str;
        }
    }

    // Regex for each character will be of the format (?:a(?=b))?
    // We also need to capture the last character, but this prevents us from matching strings with a single character. (?:.|c)?
    $reg = $delimiter;
    for ( $i = 0; $i < $l; $i++ ) {
        $a = preg_quote( $str[ $i ], $delimiter );
        $b = $i + 1 < $l ? preg_quote( $str[ $i + 1 ], $delimiter ) : false;
        $reg .= sprintf( $b !== false ? '(?:%s(?=%s))?' : '(?:.|%s)?', $a, $b );
    }
    $reg .= $delimiter;
    if ( ! $case_sensitive ) {
        $reg .= 'i';
    }
    // Resulting example regex from a string 'abbc':
    // '#(?:a(?=b))?(?:b(?=b))?(?:b(?=c))?(?:.|c)?#i';

    // Perform our regex on the remaining string
    $str = $l1 <= $l2 ? $str2 : $str1;
    if ( preg_match_all( $reg, $str, $matches ) ) {
        // $matches is an array with a single array with all the matches.
        return array_reduce( $matches[0], function( $a, $b ) {
            $al = strlen( $a );
            $bl = strlen( $b );
            // Return the longest string, as long as it's not a single character.
            return $al >= $bl || $bl <= 1 ? $a : $b;
        }, '' );
    }

    // No match - Return an empty string.
    return '';
}

It'll generate a regex using the shorter of the two strings, although performance will most likely be the same either way. It may incorrectly match strings with recurring substrings, and we're limited to matching strings of two characters or more, unless they are equal or one is a substring of the other. For Instance:

// Works as intended.
get_longest_common_subsequence( 'abbc', 'abc' ) === 'ab';

// Returns incorrect substring based on string length and recurring substrings.
get_longest_common_subsequence( 'abbc', 'abcdef' ) === 'abc';

// Does not return any matches, as all recurring strings are only a single character long.
get_longest_common_subsequence( 'abc', 'ace' ) === '';

// One of the strings is a substring of the other.
get_longest_common_subsequence( 'abc', 'a' ) === 'a';

Regardless, it functions using an alternate method and the regex can be refined to tackle additional situations.

若水般的淡然安静女子 2024-10-11 15:39:40

我不确定是否理解您的完整请求,但想法是:

让 A 成为您的 URL,B 成为您的“/~machinehost/deployment_folder/users/bob/settings”,

  • 在 A -> 中搜索 B你得到一个索引 i (其中 i 是 A 中 B 的第一个 / 的位置)
  • let l = length(A)
  • 你需要将 B 从 (li) 剪切到 length(B) 以获取 B 的最后一部分(/ users/bob/settings)

我还没有测试过,但如果你真的需要,我可以帮助你让这个出色的(讽刺的)解决方案发挥作用。

请注意,使用正则表达式可能是可能的,例如

$pattern = "$B(.*?)"
$res = array();
preg_match_all($pattern, $A, $res);

编辑:我认为您的最后一条评论使我的回复无效。但你想要的是找到子字符串。因此,您可以首先从一个重型算法开始,尝试在 {2, length(B)} 中找到 A 中的 B[1:i],然后使用一些 动态编程内容。

I'm not sure to understand your full request, but the idea is:

Let A be your URL and B your "/~machinehost/deployment_folder/users/bob/settings"

  • search B in A -> you get an index i (where i is the position of the first / of B in A)
  • let l = length(A)
  • You need to cut B from (l-i) to length(B) to grab the last part of B (/users/bob/settings)

I have not tested yet, but if you really need, I can help you make this brilliant (ironical) solution work.

Note that it may be possible with regular expressions like

$pattern = "$B(.*?)"
$res = array();
preg_match_all($pattern, $A, $res);

Edit: I think your last comment invalidates my response. But what you want is finding substrings. So you can first start with a heavy algorithm trying to find B[1:i] in A for i in {2, length(B)} and then use some dynamic programming stuffs.

将军与妓 2024-10-11 15:39:40

它似乎不是满足您要求的现成代码。那么让我们寻找一种简单的方法。

在这个练习中,我使用了两种方法,一种用于查找最长的匹配,另一种用于截断匹配部分。

FindLongestMatch() 方法分解一条路径,逐段在另一条路径中寻找匹配项,只保留一个匹配项,即最长的匹配项(没有数组,没有排序)。
RemoveLongestMatch() 方法采用找到的最长匹配位置之后的后缀或“余数”。

这里是完整的源代码:

<?php

function FindLongestMatch($relativePath, $absolutePath)
{
    static $_separator = '/';
    $splitted = array_reverse(explode($_separator, $absolutePath));

    foreach ($splitted as &$value)
    {
        $matchTest = $value.$_separator.$match;
        if(IsSubstring($relativePath, $matchTest))
            $match = $matchTest;

        if (!empty($value) && IsNewMatchLonger($match, $longestMatch))
            $longestMatch = $match;
    }

    return $longestMatch;
}

//Removes from the first string the longest match.
function RemoveLongestMatch($relativePath, $absolutePath)
{
    $match = findLongestMatch($relativePath, $absolutePath);
    $positionFound = strpos($relativePath, $match);     
    $suffix = substr($relativePath, $positionFound + strlen($match));

    return $suffix;
}

function IsNewMatchLonger($match, $longestMatch)
{
    return strlen($match) > strlen($longestMatch);
}

function IsSubstring($string, $subString)
{
    return strpos($string, $subString) > 0;
}

这是测试用例的代表性子集:

//TEST CASES
echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://2.2.2.2/~machinehost/deployment_folder/';
echo "<br>".$relativePath = '/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://1.1.1.1/root/~machinehost/deployment_folder/';
echo "<br>".$relativePath = '/root/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://2.2.2.2/~machinehost/deployment_folder/users/';
echo "<br>".$relativePath = '/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://3.3.3.3/~machinehost/~machinehost/subDirectory/deployment_folder/';
echo "<br>".$relativePath = '/~machinehost/subDirectory/deployment_folderX/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

运行以前的测试用例会提供以下输出:

http://2.2.2.2/~machinehost/deployment_folder/
/~machinehost/deployment_folder/users/bob/settings
Longuest match: ~machinehost/deployment_folder/
Suffix: users/bob/settings

http://1.1.1.1/root/~machinehost/deployment_folder/
/root/~machinehost/deployment_folder/users/bob/settings
Longuest match: root/~machinehost/deployment_folder/
Suffix: users/bob/settings

http://2.2.2.2/~machinehost/deployment_folder/users/
/~machinehost/deployment_folder/users/bob/settings
Longuest match: ~machinehost/deployment_folder/users/
Suffix: bob/settings

http://3.3.3.3/~machinehost/~machinehost/subDirectory/deployment_folder/
/~machinehost/subDirectory/deployment_folderX/users/bob/settings
Longuest match: ~machinehost/subDirectory/
Suffix: deployment_folderX/users/bob/settings

也许您可以采用这段代码的想法并将其转化为对当前项目有用的东西。
让我知道它是否也对你有用。顺便说一下,oreX先生的回答看起来也不错。

it does not seem to be an out of the box code out there for your requirement. So lets look for a simple way.

For this exercise I utilized two methods, one for finding the longest match, and another one to chop off the matching portion.

The FindLongestMatch() method, takes apart a path, piece by piece seeks for a match in the other path, keeping just one match, the longest one (no arrays, no sorting).
The RemoveLongestMatch() method takes the suffix or 'remainder' after the longest match found position.

Here the full source code:

<?php

function FindLongestMatch($relativePath, $absolutePath)
{
    static $_separator = '/';
    $splitted = array_reverse(explode($_separator, $absolutePath));

    foreach ($splitted as &$value)
    {
        $matchTest = $value.$_separator.$match;
        if(IsSubstring($relativePath, $matchTest))
            $match = $matchTest;

        if (!empty($value) && IsNewMatchLonger($match, $longestMatch))
            $longestMatch = $match;
    }

    return $longestMatch;
}

//Removes from the first string the longest match.
function RemoveLongestMatch($relativePath, $absolutePath)
{
    $match = findLongestMatch($relativePath, $absolutePath);
    $positionFound = strpos($relativePath, $match);     
    $suffix = substr($relativePath, $positionFound + strlen($match));

    return $suffix;
}

function IsNewMatchLonger($match, $longestMatch)
{
    return strlen($match) > strlen($longestMatch);
}

function IsSubstring($string, $subString)
{
    return strpos($string, $subString) > 0;
}

This is a representative subset of Test Cases:

//TEST CASES
echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://2.2.2.2/~machinehost/deployment_folder/';
echo "<br>".$relativePath = '/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://1.1.1.1/root/~machinehost/deployment_folder/';
echo "<br>".$relativePath = '/root/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://2.2.2.2/~machinehost/deployment_folder/users/';
echo "<br>".$relativePath = '/~machinehost/deployment_folder/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

echo "<br>-----------------------------------------------------------"; 
echo "<br>".$absolutePath = 'http://3.3.3.3/~machinehost/~machinehost/subDirectory/deployment_folder/';
echo "<br>".$relativePath = '/~machinehost/subDirectory/deployment_folderX/users/bob/settings';
echo "<br>Longest match: ".findLongestMatch($relativePath, $absolutePath);
echo "<br>Suffix: ".removeLongestMatch($relativePath, $absolutePath);

Running previous Test Cases provides the following output:

http://2.2.2.2/~machinehost/deployment_folder/
/~machinehost/deployment_folder/users/bob/settings
Longuest match: ~machinehost/deployment_folder/
Suffix: users/bob/settings

http://1.1.1.1/root/~machinehost/deployment_folder/
/root/~machinehost/deployment_folder/users/bob/settings
Longuest match: root/~machinehost/deployment_folder/
Suffix: users/bob/settings

http://2.2.2.2/~machinehost/deployment_folder/users/
/~machinehost/deployment_folder/users/bob/settings
Longuest match: ~machinehost/deployment_folder/users/
Suffix: bob/settings

http://3.3.3.3/~machinehost/~machinehost/subDirectory/deployment_folder/
/~machinehost/subDirectory/deployment_folderX/users/bob/settings
Longuest match: ~machinehost/subDirectory/
Suffix: deployment_folderX/users/bob/settings

Maybe you can take the idea of this piece of code and turn it into something that you find useful for your current project.
Let me know if it worked for you too. By the way, Mr. oreX answer looks good too.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文