自动从字符串生成摘要

发布于 2024-10-20 04:42:50 字数 1248 浏览 6 评论 0原文

给定一个字符串输入，我们需要通过将字符串的末尾修剪为给定的长度来生成一种非常简单的摘要形式。

这是第一个版本的函数：

// Take an array of strings and generate a summary within a given length
function stringSummaryFromMetadata($inArray,$len=80,$sep='§'){

    // Filter out 'false' values
    $inputs=array_filter($inArray);

    // First try just imploding array
    $res=implode($sep,$inputs);

    // Check for length
    if(mb_strlen($res, 'utf8')>$len){

        // Calculate 'z' the fixed width constant
        $x=count($inputs);
        $z=round(($len-$x)/$x);

        // Snip all strings to 'z'
        $t1=array();
        foreach($inputs as $i) $t1[]=mb_substr($i,0,$z);

        // Final answer
        $res=implode($sep,$t1);
    }

    return $res;
}

测试：

$test=array(
    'Ligula diam risus tempus lorem sit',
    'Cursus metus commodo enim odio orci',
    'Metus sapien porta sapien fusce sodales',
    'king queen'
);
$out=stringSummaryFromMetadata($test);
print $out;

给出：

Ligula diam risus t§Cursus metus 普通§Metus sapien porta §king queen

这已经足够好了，但我确信它可以更加优化。例如，测试输出少于 80 个字母、修剪后字符串末尾有空格、单词被截断等等。

在我开始切线并推出自己的内容之前，我想询问社区是否已解决此问题之前询问过和/或是否已经存在这方面的算法。

原文

Given an input of strings we need to generate a very simple form of summary by trimming off the end of the strings into a given length.

Here is a first version function:

// Take an array of strings and generate a summary within a given length
function stringSummaryFromMetadata($inArray,$len=80,$sep='§'){

    // Filter out 'false' values
    $inputs=array_filter($inArray);

    // First try just imploding array
    $res=implode($sep,$inputs);

    // Check for length
    if(mb_strlen($res, 'utf8')>$len){

        // Calculate 'z' the fixed width constant
        $x=count($inputs);
        $z=round(($len-$x)/$x);

        // Snip all strings to 'z'
        $t1=array();
        foreach($inputs as $i) $t1[]=mb_substr($i,0,$z);

        // Final answer
        $res=implode($sep,$t1);
    }

    return $res;
}

A test:

$test=array(
    'Ligula diam risus tempus lorem sit',
    'Cursus metus commodo enim odio orci',
    'Metus sapien porta sapien fusce sodales',
    'king queen'
);
$out=stringSummaryFromMetadata($test);
print $out;

Which gives:

Ligula diam risus t§Cursus metus
commod§Metus sapien porta §king queen

Thats good enough but it can be much more optimal I'm sure of that. For example, the test output is less than 80 letters, whitespace at the end of the string after trimming, words are chopped, etc.

Before I go off on a tangent and roll my own I would like to ask the community if this has been asked before and/or if an algorithm already exists for this.

分享到QQ

分享到微博