strpos() 有多针?

发布于 2024-11-27 01:40:10 字数 845 浏览 1 评论 0原文

我正在寻找像 strpos() 这样的函数,它有两个显着的区别:

  1. 能够接受多个针。我的意思是数千针。
  2. 搜索大海捞针中所有出现的针并返回起始位置数组。

当然,它必须是一个有效的解决方案,而不仅仅是穿过每根针的一个循环。我搜索过这个论坛,发现了与此类似的问题,例如:

但它们都不是我要寻找的。我使用 strpos 只是为了更好地说明我的问题,可能必须使用完全不同的东西来达到此目的。

我知道 Zend_Search_Lucene 并且我很感兴趣是否可以用于实现此目的以及如何实现(只是总体思路)?

非常感谢您的帮助和时间!

I am looking for a function like strpos() with two significant differences:

  1. To be able to accept multiple needles. I mean thousands of needles at ones.
  2. To search for all occurrences of the needles in the haystack and to return an array of starting positions.

Of course it has to be an efficient solution not just a loop through every needle. I have searched through this forum and there were similar questions to this one, like:

but nether of them was what I am looking for. I am using strpos just to illustrate my question better, probably something entirely different has to be used for this purpose.

I am aware of Zend_Search_Lucene and I am interested if it can be used to achieve this and how (just the general idea)?

Thanks a lot for Your help and time!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

清风疏影 2024-12-04 01:40:11

preg 匹配

if (preg_match('/word|word2/i', $str))

尝试对多个检查多个 strpos 值进行

try preg match for multiple

if (preg_match('/word|word2/i', $str))

Checking for multiple strpos values

没有你我更好 2024-12-04 01:40:11

以下是我的策略的一些示例代码:

function strpos_array($haystack, $needles, $offset=0) {
    $matches = array();

    //Avoid the obvious: when haystack or needles are empty, return no matches
    if(empty($needles) || empty($haystack)) {
        return $matches;
    }

    $haystack = (string)$haystack; //Pre-cast non-string haystacks
    $haylen = strlen($haystack);

    //Allow negative (from end of haystack) offsets
    if($offset < 0) {
        $offset += $heylen;
    }

    //Use strpos if there is no array or only one needle
    if(!is_array($needles)) {
        $needles = array($needles);
    }

    $needles = array_unique($needles); //Not necessary if you are sure all needles are unique

    //Precalculate needle lengths to save time
    foreach($needles as &$origNeedle) {
        $origNeedle = array((string)$origNeedle, strlen($origNeedle));
    }

    //Find matches
    for(; $offset < $haylen; $offset++) {
        foreach($needles as $needle) {
            list($needle, $length) = $needle;
            if($needle == substr($haystack, $offset, $length)) {
                $matches[] = $offset;
                break;
            }
        }
    }

    return($matches);
}

我在上面实现了一个简单的蛮力方法,该方法适用于针和大海捞针的任何组合(不仅仅是单词)。对于可能更快的算法,请查看:


其他解决方案

function strpos_array($haystack, $needles, $theOffset=0) {
    $matches = array();

    if(empty($haystack) || empty($needles)) {
        return $matches;
    }

    $haylen = strlen($haystack);

    if($theOffset < 0) {  // Support negative offsets
        $theOffest += $haylen;
    }

    foreach($needles as $needle) {
        $needlelen = strlen($needle);
        $offset = $theOffset;

        while(($match = strpos($haystack, $needle, $offset)) !== false) {
            $matches[] = $match;
            $offset = $match + $needlelen;
            if($offset >= $haylen) {
                break;
            }
        }
    }

    return $matches;
}

Here's some sample code for my strategy:

function strpos_array($haystack, $needles, $offset=0) {
    $matches = array();

    //Avoid the obvious: when haystack or needles are empty, return no matches
    if(empty($needles) || empty($haystack)) {
        return $matches;
    }

    $haystack = (string)$haystack; //Pre-cast non-string haystacks
    $haylen = strlen($haystack);

    //Allow negative (from end of haystack) offsets
    if($offset < 0) {
        $offset += $heylen;
    }

    //Use strpos if there is no array or only one needle
    if(!is_array($needles)) {
        $needles = array($needles);
    }

    $needles = array_unique($needles); //Not necessary if you are sure all needles are unique

    //Precalculate needle lengths to save time
    foreach($needles as &$origNeedle) {
        $origNeedle = array((string)$origNeedle, strlen($origNeedle));
    }

    //Find matches
    for(; $offset < $haylen; $offset++) {
        foreach($needles as $needle) {
            list($needle, $length) = $needle;
            if($needle == substr($haystack, $offset, $length)) {
                $matches[] = $offset;
                break;
            }
        }
    }

    return($matches);
}

I've implemented a simple brute force method above that will work with any combination of needles and haystacks (not just words). For possibly faster algorithms check out:


Other Solution

function strpos_array($haystack, $needles, $theOffset=0) {
    $matches = array();

    if(empty($haystack) || empty($needles)) {
        return $matches;
    }

    $haylen = strlen($haystack);

    if($theOffset < 0) {  // Support negative offsets
        $theOffest += $haylen;
    }

    foreach($needles as $needle) {
        $needlelen = strlen($needle);
        $offset = $theOffset;

        while(($match = strpos($haystack, $needle, $offset)) !== false) {
            $matches[] = $match;
            $offset = $match + $needlelen;
            if($offset >= $haylen) {
                break;
            }
        }
    }

    return $matches;
}
沫离伤花 2024-12-04 01:40:11

我知道这并不能回答OP的问题,但想发表评论,因为此页面位于Google顶部的多针strpos。这是一个简单的解决方案(同样,这不是特定于操作员的问题 - 抱歉):

    $img_formats = array('.jpg','.png');
    $missing = array();
    foreach ( $img_formats as $format )
        if ( stripos($post['timer_background_image'], $format) === false ) $missing[] = $format;
    if (count($missing) == 2)
        return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));

如果将 2 个项目添加到 $missing 数组中,则意味着输入不满足 $ 中的任何图像格式img_formats 数组。那时你知道你可以返回一个错误等。这可以很容易地变成一个小函数:

    function m_stripos( $haystack = null, $needles = array() ){
        //return early if missing arguments 
        if ( !$needles || !$haystack ) return false; 
        // create an array to evaluate at the end
        $missing = array(); 
        //Loop through needles array, and add to $missing array if not satisfied
        foreach ( $needles as $needle )
            if ( stripos($haystack, $needle) === false ) $missing[] = $needle;
        //If the count of $missing and $needles is equal, we know there were no matches, return false..
        if (count($missing) == count($needles)) return false; 
        //If we're here, be happy, return true...
        return true;
    }

回到我们的第一个例子,然后使用该函数:

    $needles = array('.jpg','.png');
    if ( !m_strpos( $post['timer_background_image'], $needles ) )
        return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));

当然,你在函数返回 true 或 false 后做什么是 up给你。

I know this doesn't answer the OP's question but wanted to comment since this page is at the top of Google for strpos with multiple needles. Here's a simple solution to do so (again, this isn't specific to the OP's question - sorry):

    $img_formats = array('.jpg','.png');
    $missing = array();
    foreach ( $img_formats as $format )
        if ( stripos($post['timer_background_image'], $format) === false ) $missing[] = $format;
    if (count($missing) == 2)
        return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));

If 2 items are added to the $missing array that means that the input doesn't satisfy any of the image formats in the $img_formats array. At that point you know that you can return an error, etc. This could easily be turned into a little function:

    function m_stripos( $haystack = null, $needles = array() ){
        //return early if missing arguments 
        if ( !$needles || !$haystack ) return false; 
        // create an array to evaluate at the end
        $missing = array(); 
        //Loop through needles array, and add to $missing array if not satisfied
        foreach ( $needles as $needle )
            if ( stripos($haystack, $needle) === false ) $missing[] = $needle;
        //If the count of $missing and $needles is equal, we know there were no matches, return false..
        if (count($missing) == count($needles)) return false; 
        //If we're here, be happy, return true...
        return true;
    }

Back to our first example using then the function instead:

    $needles = array('.jpg','.png');
    if ( !m_strpos( $post['timer_background_image'], $needles ) )
        return array("save_data"=>$post,"error"=>array("message"=>"The background image must be in a .jpg or .png format.","field"=>"timer_background_image"));

Of course, what you do after the function returns true or false is up to you.

随波逐流 2024-12-04 01:40:11

您似乎正在搜索整个单词。在这种情况下,类似这样的事情可能会有所帮助。由于它使用内置函数,它应该比自定义代码更快,但您必须对其进行分析:

$words = str_word_count($str, 2);

$word_position_map = array();

foreach($words as $position => $word) {
    if(!isset($word_position_map[$word])) {
        $word_position_map[$word] = array();
    }
    $word_position_map[$word][] = $position;
}

// assuming $needles is an array of words
$result = array_intersect_key($word_position_map, array_flip($needles));

以正确的格式存储信息(如针)将改善运行时间(例如,因为您不必调用 <代码>array_flip)。

str_word_count 文档中的注释:

就该函数而言,“word”被定义为包含字母字符的区域设置相关字符串,该字符串也可能包含“'”和“-”字符,但不能以“'”和“-”字符开头。

因此,请确保您设置了正确的区域设置。

It seems you are searching for whole words. In this case, something like this might help. As it uses built-in functions, it should be faster than custom code, but you have to profile it:

$words = str_word_count($str, 2);

$word_position_map = array();

foreach($words as $position => $word) {
    if(!isset($word_position_map[$word])) {
        $word_position_map[$word] = array();
    }
    $word_position_map[$word][] = $position;
}

// assuming $needles is an array of words
$result = array_intersect_key($word_position_map, array_flip($needles));

Storing the information (like the needles) in the right format will improve the runtime ( e.g. as you don't have to call array_flip).

Note from the str_word_count documentation:

For the purpose of this function, 'word' is defined as a locale dependent string containing alphabetic characters, which also may contain, but not start with "'" and "-" characters.

So make sure you set the locale right.

吝吻 2024-12-04 01:40:11

您可以使用正则表达式,它们支持 OR 运算。然而,与 strpos 相比,这会使其相当慢。

You could use a regular expression, they support OR operations. This would however make it fairly slow, compared to strpos.

时光是把杀猪刀 2024-12-04 01:40:11

使用 array_map() 的简单解决方案怎么样

$string = 'one two three four';
$needles = array( 'five' , 'three' );
$strpos_arr = array_map( function ( $check ) use ( $string ) {
    return strpos( $string, $check );
}, $needles );

作为返回,您将拥有一个数组,其中键是针位置,值是起始位置(如果找到)。

//print_r( $strpos_arr );
Array
(
    [0] => 
    [1] => 8
)

How about a simple solution using array_map()?

$string = 'one two three four';
$needles = array( 'five' , 'three' );
$strpos_arr = array_map( function ( $check ) use ( $string ) {
    return strpos( $string, $check );
}, $needles );

As return, you're going to have an array where the keys are the needles positions and the values are the starting positions, if found.

//print_r( $strpos_arr );
Array
(
    [0] => 
    [1] => 8
)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文