获取字符串的前 N ​​个单词

发布于 2024-11-05 23:33:24 字数 26 浏览 0 评论 0原文

如何只获取字符串中的前 10 个单词?

How do I only get the first 10 words from a string?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

野却迷人 2024-11-12 23:33:24
implode(' ', array_slice(explode(' ', $sentence), 0, 10));

为了添加对逗号和破折号等其他分词符的支持,preg_match 提供了一种快速方法,并且不需要拆分字符串:

function get_words($sentence, $count = 10) {
  preg_match("/(?:\w+(?:\W+|$)){0,$count}/", $sentence, $matches);
  return $matches[0];
}

正如 Pebbl 提到的,PHP 不能很好地处理 UTF-8 或 Unicode ,所以如果这是一个问题,那么您可以将 \w 替换为 [^\s,\.;\?\!]\W > 对于[\s,\.;\?\!]

implode(' ', array_slice(explode(' ', $sentence), 0, 10));

To add support for other word breaks like commas and dashes, preg_match gives a quick way and doesn't require splitting the string:

function get_words($sentence, $count = 10) {
  preg_match("/(?:\w+(?:\W+|$)){0,$count}/", $sentence, $matches);
  return $matches[0];
}

As Pebbl mentions, PHP doesn't handle UTF-8 or Unicode all that well, so if that is a concern then you can replace \w for [^\s,\.;\?\!] and \W for [\s,\.;\?\!].

薄凉少年不暖心 2024-11-12 23:33:24

如果句子结构中存在意外字符代替空格,或者句子包含多个相连的空格,则简单地按空格分割将无法正确运行。

无论您在单词之间使用哪种“空格”,以下版本都可以工作,并且可以轻松扩展以处理其他字符...它目前支持任何空白字符加上 , 。 ; ? !

function get_snippet( $str, $wordCount = 10 ) {
  return implode( 
    '', 
    array_slice( 
      preg_split(
        '/([\s,\.;\?\!]+)/', 
        $str, 
        $wordCount*2+1, 
        PREG_SPLIT_DELIM_CAPTURE
      ),
      0,
      $wordCount*2-1
    )
  );
}

正则表达式非常适合解决这个问题,因为您可以轻松地使代码变得灵活或严格,如您所愿。不过,你必须要小心。我专门针对单词之间的间隙(而不是单词本身)来处理上述内容,因为明确说明单词的定义是相当困难的。

\w 字边界或其逆\W。我很少依赖这些,主要是因为 - 取决于您使用的软件(例如某些版本的 PHP) - 它们并不总是包含 UTF-8 或 Unicode 字符

在正则表达式中,最好始终保持具体。这样您的表达式就可以处理如下内容,无论它们在何处呈现:

echo get_snippet('Это не те дроиды, которые вы ищете', 5);

/// outputs: Это не те дроиды, которые

然而,就性能而言,避免拆分可能是值得的。因此,您可以使用 Kelly 的更新方法,但将 \w 切换为 [^\s,\.;\?\!]+\W对于[\s,\.;\?\!]+。虽然我个人喜欢上面使用的分割表达式的简单性,但它更容易阅读和修改。然而 PHP 函数的堆栈有点丑陋:)

Simply splitting on spaces will function incorrectly if there is an unexpected character in place of a space in the sentence structure, or if the sentence contains multiple conjoined spaces.

The following version will work no matter what kind of "space" you use between words and can be easily extended to handle other characters... it currently supports any white space character plus , . ; ? !

function get_snippet( $str, $wordCount = 10 ) {
  return implode( 
    '', 
    array_slice( 
      preg_split(
        '/([\s,\.;\?\!]+)/', 
        $str, 
        $wordCount*2+1, 
        PREG_SPLIT_DELIM_CAPTURE
      ),
      0,
      $wordCount*2-1
    )
  );
}

Regular expressions are perfect for this issue, because you can easily make the code as flexible or strict as you like. You do have to be careful however. I specifically approached the above targeting the gaps between words — rather than the words themselves — because it is rather difficult to state unequivocally what will define a word.

Take the \w word boundary, or its inverse \W. I rarely rely on these, mainly because — depending on the software you are using (like certain versions of PHP) — they don't always include UTF-8 or Unicode characters.

In regular expressions it is better to be specific, at all times. So that your expressions can handle things like the following, no matter where they are rendered:

echo get_snippet('Это не те дроиды, которые вы ищете', 5);

/// outputs: Это не те дроиды, которые

Avoiding splitting could be worthwhile however, in terms of performance. So you could use Kelly's updated approach but switch \w for [^\s,\.;\?\!]+ and \W for [\s,\.;\?\!]+. Although, personally I like the simplicity of the splitting expression used above, it is easier to read and therefore modify. The stack of PHP functions however, is a bit ugly :)

鯉魚旗 2024-11-12 23:33:24

http://snipplr.com /view/8480/a-php-function-to-return-the-first-n-words-from-a-string/

function shorten_string($string, $wordsreturned)
{
    $retval = $string;  //  Just in case of a problem
    $array = explode(" ", $string);
    /*  Already short enough, return the whole thing*/
    if (count($array)<=$wordsreturned)
    {
        $retval = $string;
    }
    /*  Need to chop of some words*/
    else
    {
        array_splice($array, $wordsreturned);
        $retval = implode(" ", $array)." ...";
    }
    return $retval;
}

http://snipplr.com/view/8480/a-php-function-to-return-the-first-n-words-from-a-string/

function shorten_string($string, $wordsreturned)
{
    $retval = $string;  //  Just in case of a problem
    $array = explode(" ", $string);
    /*  Already short enough, return the whole thing*/
    if (count($array)<=$wordsreturned)
    {
        $retval = $string;
    }
    /*  Need to chop of some words*/
    else
    {
        array_splice($array, $wordsreturned);
        $retval = implode(" ", $array)." ...";
    }
    return $retval;
}
半暖夏伤 2024-11-12 23:33:24

我建议使用str_word_count

<?php
$str = "Lorem ipsum       dolor sit    amet, 
        consectetur        adipiscing elit";
print_r(str_word_count($str, 1));
?>

上面的例子将输出:

Array
(
    [0] => Lorem
    [1] => ipsum
    [2] => dolor
    [3] => sit
    [4] => amet
    [5] => consectetur
    [6] => adipiscing
    [7] => elit
)

使用循环来获取你想要的单词。

来源:http://php.net/str_word_count

I suggest to use str_word_count:

<?php
$str = "Lorem ipsum       dolor sit    amet, 
        consectetur        adipiscing elit";
print_r(str_word_count($str, 1));
?>

The above example will output:

Array
(
    [0] => Lorem
    [1] => ipsum
    [2] => dolor
    [3] => sit
    [4] => amet
    [5] => consectetur
    [6] => adipiscing
    [7] => elit
)

The use a loop to get the words you want.

Source: http://php.net/str_word_count

蒗幽 2024-11-12 23:33:24

要选择给定文本的 10 个单词,您可以实现以下功能:

function first_words($text, $count=10)
{
    $words = explode(' ', $text);

    $result = '';
    for ($i = 0; $i < $count && isset($words[$i]); $i++) {
        $result .= $words[$i];
    }

    return $result;
}

To select 10 words of the given text you can implement following function:

function first_words($text, $count=10)
{
    $words = explode(' ', $text);

    $result = '';
    for ($i = 0; $i < $count && isset($words[$i]); $i++) {
        $result .= $words[$i];
    }

    return $result;
}
兔小萌 2024-11-12 23:33:24

这可以使用 str_word_count() 轻松完成

$first10words = implode(' ', array_slice(str_word_count($sentence,1), 0, 10));

This can easily be done using str_word_count()

$first10words = implode(' ', array_slice(str_word_count($sentence,1), 0, 10));
饮惑 2024-11-12 23:33:24

这可能对你有帮助。返回 N 号的函数。的话

public function getNWordsFromString($text,$numberOfWords = 6)
{
    if($text != null)
    {
        $textArray = explode(" ", $text);
        if(count($textArray) > $numberOfWords)
        {
            return implode(" ",array_slice($textArray, 0, $numberOfWords))."...";
        }
        return $text;
    }
    return "";
    }
}

This might help you. Function to return N no. of words

public function getNWordsFromString($text,$numberOfWords = 6)
{
    if($text != null)
    {
        $textArray = explode(" ", $text);
        if(count($textArray) > $numberOfWords)
        {
            return implode(" ",array_slice($textArray, 0, $numberOfWords))."...";
        }
        return $text;
    }
    return "";
    }
}
夏末 2024-11-12 23:33:24

试试这个

$str = 'Lorem ipsum dolor sit amet,consectetur adipiscing elit. Mauris ornare luctus diam sit amet mollis.';
 $arr = explode(" ", str_replace(",", ", ", $str));
 for ($index = 0; $index < 10; $index++) {
 echo $arr[$index]. " ";
}

我知道现在不是回答的时候,但让新来者选择自己的答案。

Try this

$str = 'Lorem ipsum dolor sit amet,consectetur adipiscing elit. Mauris ornare luctus diam sit amet mollis.';
 $arr = explode(" ", str_replace(",", ", ", $str));
 for ($index = 0; $index < 10; $index++) {
 echo $arr[$index]. " ";
}

I know this is not time to answer , but let the new comers choose their own answers.

画▽骨i 2024-11-12 23:33:24
    function get_first_num_of_words($string, $num_of_words)
    {
        $string = preg_replace('/\s+/', ' ', trim($string));
        $words = explode(" ", $string); // an array

        // if number of words you want to get is greater than number of words in the string
        if ($num_of_words > count($words)) {
            // then use number of words in the string
            $num_of_words = count($words);
        }

        $new_string = "";
        for ($i = 0; $i < $num_of_words; $i++) {
            $new_string .= $words[$i] . " ";
        }

        return trim($new_string);
    }

像这样使用它:

echo get_first_num_of_words("Lorem ipsum dolor sit amet consectetur adipisicing elit. Aliquid, illo?", 5);

输出:Lorem ipsum dolor sat amet

此函数对于阿拉伯字符等 unicode 字符也能很好地工作。

echo get_first_num_of_words("نموذج لنص عربي الغرض منه توضيح كيف يمكن استخلاص أول عدد معين من الكلمات الموجودة فى نص معين.", 100);

输出:> qucy>fouistimغsty:fouthing:f。

    function get_first_num_of_words($string, $num_of_words)
    {
        $string = preg_replace('/\s+/', ' ', trim($string));
        $words = explode(" ", $string); // an array

        // if number of words you want to get is greater than number of words in the string
        if ($num_of_words > count($words)) {
            // then use number of words in the string
            $num_of_words = count($words);
        }

        $new_string = "";
        for ($i = 0; $i < $num_of_words; $i++) {
            $new_string .= $words[$i] . " ";
        }

        return trim($new_string);
    }

Use it like this:

echo get_first_num_of_words("Lorem ipsum dolor sit amet consectetur adipisicing elit. Aliquid, illo?", 5);

Output: Lorem ipsum dolor sit amet

This function also works very well with unicode characters like Arabic characters.

echo get_first_num_of_words("نموذج لنص عربي الغرض منه توضيح كيف يمكن استخلاص أول عدد معين من الكلمات الموجودة فى نص معين.", 100);

Output: نموذج لنص عربي الغرض منه توضيح كيف يمكن استخلاص أول عدد معين من الكلمات الموجودة فى نص معين.

余生共白头 2024-11-12 23:33:24

这完全就是我们正在寻找的
只需剪切并粘贴到您的程序中即可运行。

function shorten_string($string, $wordsreturned)
/*  Returns the first $wordsreturned out of $string.  If string
contains fewer words than $wordsreturned, the entire string
is returned.
*/
{
$retval = $string;      //  Just in case of a problem

$array = explode(" ", $string);
if (count($array)<=$wordsreturned)
/*  Already short enough, return the whole thing
*/
{
$retval = $string;
}
else
/*  Need to chop of some words
*/
{
array_splice($array, $wordsreturned);
$retval = implode(" ", $array)." ...";
}
return $retval;
}

只需调用代码块中的函数即可

$data_itr = shorten_string($Itinerary,25);

It is totally what we are searching
Just cut n pasted into your program and ran.

function shorten_string($string, $wordsreturned)
/*  Returns the first $wordsreturned out of $string.  If string
contains fewer words than $wordsreturned, the entire string
is returned.
*/
{
$retval = $string;      //  Just in case of a problem

$array = explode(" ", $string);
if (count($array)<=$wordsreturned)
/*  Already short enough, return the whole thing
*/
{
$retval = $string;
}
else
/*  Need to chop of some words
*/
{
array_splice($array, $wordsreturned);
$retval = implode(" ", $array)." ...";
}
return $retval;
}

and just call the function in your block of code just as

$data_itr = shorten_string($Itinerary,25);
很酷又爱笑 2024-11-12 23:33:24

我这样做:

function trim_by_words($string, $word_count = 10) {
    $string = explode(' ', $string);
    if (empty($string) == false) {
        $string = array_chunk($string, $word_count);
        $string = $string[0];
    }
    $string = implode(' ', $string);
    return $string;
}

它兼容 UTF8...

I do it this way:

function trim_by_words($string, $word_count = 10) {
    $string = explode(' ', $string);
    if (empty($string) == false) {
        $string = array_chunk($string, $word_count);
        $string = $string[0];
    }
    $string = implode(' ', $string);
    return $string;
}

Its UTF8 compatible...

笑红尘 2024-11-12 23:33:24

这可能对你有帮助。返回 10 no 的函数。词数

function num_of_word($text,$numb) {
 $wordsArray = explode(" ", $text);
 $parts = array_chunk($wordsArray, $numb);

 $final = implode(" ", $parts[0]);

 if(isset($parts[1]))
     $final = $final." ...";
 return $final;
 return;
 }
echo num_of_word($text, 10);

This might help you. Function to return 10 no. of words.

function num_of_word($text,$numb) {
 $wordsArray = explode(" ", $text);
 $parts = array_chunk($wordsArray, $numb);

 $final = implode(" ", $parts[0]);

 if(isset($parts[1]))
     $final = $final." ...";
 return $final;
 return;
 }
echo num_of_word($text, 10);
北斗星光 2024-11-12 23:33:24

不是生成一个包含 N 个单词的数组,然后截断数组,然后重新内爆单词,而是截断第 N 个单词之后的输入字符串。 Demo

echo preg_replace('/(?:\s*\S+){10}\K.*/', '', $string);

该模式将搜索 N 个零个或多个空白字符后跟一个或多个非空白字符的序列,然后\K 重新启动全字符串匹配(有效地“释放”匹配字符,然后 .* 将匹配字符串的其余部分。无论匹配到什么,都将被替换为空字符串 。

该解决方案将确保 输出字符串的单词数可能不超过 N 个,因此请注意,不会发生任何突变,并且如果该字符串有尾随空格,则不会删除该空格。 。

为了确保删除前导和空格,请调整模式以捕获由空格分隔的 0 到 N 个单词 一个>

$string = '    I would like to know   ';

var_export(
    preg_replace('/\s*(\S*(?:\s+\S+){0,9}).*/', '$1', $string)
);

Instead of generating an array of N words, then truncating the array, then re-imploding the words, just truncate the input string after the Nth word. Demo

echo preg_replace('/(?:\s*\S+){10}\K.*/', '', $string);

The pattern will search N sequences of zero or more whitespace character followed by one or more non-whitespace characters, then \K restarts the fullstring match (effectively "releasing" the matches characters, then .* will match the rest of the string. Whatever is matched will be replaced with an empty string.

This solution will ensure that the output string does not have more than N words. It is possible that the string has fewer words than N, so be aware that no mutation will take place and that if that string has a trailing whitespace -- that whitespace will not be removed.

To ensure that leading and whitespaces are removed, adjust the pattern to capture zero to N words which are delimited by whitespaces. Demo

$string = '    I would like to know   ';

var_export(
    preg_replace('/\s*(\S*(?:\s+\S+){0,9}).*/', '$1', $string)
);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文