如何在数组中搜索字符串的一部分？

发布于 2024-11-24 19:28:08 字数 398 浏览 3 评论 0原文

我想搜索整个字符串或字符串的一部分是否是数组的一部分。在 PHP 中如何实现这一点？

另外，我如何在其中使用 metaphone ？

示例：

array1={'India','USA','China'};
array2={'India is in east','United States of America is USA','Made in China'}

如果我在 array2 中搜索 array1，则：

“印度”应匹配“印度位于东部”，对于美国和印度也是如此。中国。

原文

I want to search whether the complete string or a part of the string is a part of the array. How can this be achieved in PHP?

Also, how can I use metaphone in it as well?

Example:

array1={'India','USA','China'};
array2={'India is in east','United States of America is USA','Made in China'}

If I search for array1 in array2, then:

'India' should match 'India is in east' and similarly for USA & China.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

凌乱心跳 2024-12-01 19:28:08

$array1 = array('India','USA','China');
$array2 = array('India is in east','United States of America is USA','Made in China');
$found = array();

foreach ($array1 as $key => $value) {
    // Thanks to @Andrea for this suggestion:
    $found[$value] = preg_grep("/$value/", $array2);
    // Alternative:
    //$found = $found + preg_grep("/$value/", $array2);
}

print_r($found);

结果：

Array
(
    [0] => India is in east
    [1] => United States of America is USA
    [2] => Made in China
)

使用 Metaphone 比较棘手。您必须确定什么构成匹配。一种方法是使用要比较的两个值的 Methaphone 结果之间的编辑距离。

更新：请参阅@Andrea 的解决方案可实现更合理的每个单词变音位比较。

这是一个粗略的例子：

$meta1 = array_map(
    create_function( '$v', 'return array(metaphone($v) => $v);' ),
    $array1
);

$meta2 = array_map(
    create_function( '$v', 'return array(metaphone($v) => $v);' ),
    $array2
);

$threshold = 3;

foreach ($meta2 as $key2 => $value2) {

    $k2 = key($value2);
    $v2 = $value2[$k2];

    foreach ($meta1 as $key1 => $value1) {

        $k1  = key($value1);
        $v1  = $value1[$k1];
        $lev = levenshtein($k2, $k1);

        if( strpos($v2, $v1) !== false || levenshtein($k2, $k1) <= $threshold ) {
            array_push( $found, $v2 );
        }
    }
}

......但它需要工作。如果阈值太高，它会产生重复项。您可能更喜欢分两次进行比赛。一个用于查找简单匹配，如我的第一个代码示例中所示，然后另一个用于与 Metaphone 匹配（如果第一个没有返回匹配项）。

$array1 = array('India','USA','China');
$array2 = array('India is in east','United States of America is USA','Made in China');
$found = array();

foreach ($array1 as $key => $value) {
    // Thanks to @Andrea for this suggestion:
    $found[$value] = preg_grep("/$value/", $array2);
    // Alternative:
    //$found = $found + preg_grep("/$value/", $array2);
}

print_r($found);

Result:

Array
(
    [0] => India is in east
    [1] => United States of America is USA
    [2] => Made in China
)

Using Metaphone is trickier. You will have to determine what constitutes a match. One way to do that is to use the Levenshtein distance between the Methaphone results for the two values being compared.

Update: See @Andrea's solution for a more sensible per-word Metaphone comparison.

Here's a rough example:

$meta1 = array_map(
    create_function( '$v', 'return array(metaphone($v) => $v);' ),
    $array1
);

$meta2 = array_map(
    create_function( '$v', 'return array(metaphone($v) => $v);' ),
    $array2
);

$threshold = 3;

foreach ($meta2 as $key2 => $value2) {

    $k2 = key($value2);
    $v2 = $value2[$k2];

    foreach ($meta1 as $key1 => $value1) {

        $k1  = key($value1);
        $v1  = $value1[$k1];
        $lev = levenshtein($k2, $k1);

        if( strpos($v2, $v1) !== false || levenshtein($k2, $k1) <= $threshold ) {
            array_push( $found, $v2 );
        }
    }
}

...but it needs work. It produces duplicates if the threshold is too high. You may prefer to run the match in two passes. One to find simple matches, as in my first code example, and then another to match with Metaphone if the first returns no matches.

回复收藏 0 原文

遇见了你 2024-12-01 19:28:08

变音位盒也可以遵循 Mike 针对严格盒提出的相同结构。

我认为不需要额外的相似性函数，因为变音位的目的应该是为我们提供一个发音相同的单词所共有的键。

$array1 = array('India','USA','China');
$array2 = array(
    'Indiuh is in east',
    'United States of America is USA',
    'Gandhi was born in India',
    'Made in China'
);
$found = array();
foreach ($array1 as $key => $value) {
    $found[$value] = preg_grep('/\b'.$value.'\b/i', $array2);
}

var_export($found);

echo "\n\n";

function meta( $sentence )
{
    return implode(' ', array_map('metaphone', explode(' ', $sentence)));
}

$array2meta = array_map('meta', $array2);
foreach ($array1 as $key => $value) {
    $valuemeta = meta($value);
    $foundmeta[$value] = preg_grep('/\b'.$valuemeta.'\b/', $array2meta);
    $foundmeta[$value] = array_intersect_key($array2, $foundmeta[$value]);
}

var_export($foundmeta);

上面的代码打印出：

array (
  'India' => 
  array (
    2 => 'Gandhi was born in India',
  ),
  'USA' => 
  array (
    1 => 'United States of America is USA',
  ),
  'China' => 
  array (
    3 => 'Made in China',
  ),
)

array (
  'India' => 
  array (
    0 => 'Indiuh is in east',
    2 => 'Gandhi was born in India',
  ),
  'USA' => 
  array (
    1 => 'United States of America is USA',
  ),
  'China' => 
  array (
    3 => 'Made in China',
  ),
)

The metaphone case could also follow the same structure proposed by Mike for the strict case.

I do not think that an additional similarity function is needed, because the purpose of the metaphone should be to give us a key that is common to words that sound the same.

$array1 = array('India','USA','China');
$array2 = array(
    'Indiuh is in east',
    'United States of America is USA',
    'Gandhi was born in India',
    'Made in China'
);
$found = array();
foreach ($array1 as $key => $value) {
    $found[$value] = preg_grep('/\b'.$value.'\b/i', $array2);
}

var_export($found);

echo "\n\n";

function meta( $sentence )
{
    return implode(' ', array_map('metaphone', explode(' ', $sentence)));
}

$array2meta = array_map('meta', $array2);
foreach ($array1 as $key => $value) {
    $valuemeta = meta($value);
    $foundmeta[$value] = preg_grep('/\b'.$valuemeta.'\b/', $array2meta);
    $foundmeta[$value] = array_intersect_key($array2, $foundmeta[$value]);
}

var_export($foundmeta);

The above code prints out:

array (
  'India' => 
  array (
    2 => 'Gandhi was born in India',
  ),
  'USA' => 
  array (
    1 => 'United States of America is USA',
  ),
  'China' => 
  array (
    3 => 'Made in China',
  ),
)

array (
  'India' => 
  array (
    0 => 'Indiuh is in east',
    2 => 'Gandhi was born in India',
  ),
  'USA' => 
  array (
    1 => 'United States of America is USA',
  ),
  'China' => 
  array (
    3 => 'Made in China',
  ),
)

回复收藏 0 原文

山田美奈子 2024-12-01 19:28:08

$a1 = array('India','USA','China');
$a2 = array('India is in east','United States of America is USA','Made in China');


foreach ( $a2 as $a )
{
  foreach( $a1 as $b  )
  {
    if ( strpos( $a, $b ) > -1 )
    {
      echo $a . " contains " . $b . "\n";
    }
  }
}

$a1 = array('India','USA','China');
$a2 = array('India is in east','United States of America is USA','Made in China');


foreach ( $a2 as $a )
{
  foreach( $a1 as $b  )
  {
    if ( strpos( $a, $b ) > -1 )
    {
      echo $a . " contains " . $b . "\n";
    }
  }
}

回复收藏 0 原文

~没有更多了~