截断文本中搜索关键字之前的内容
我使用下面的代码在文本中的第一个搜索关键字之前和之后截断我的内容(这是针对我的搜索页面),除了在截断开始时将单词切成两半的代码之外,一切正常不要在截断末尾截断单词。
示例:(
lients at the centre of the relationship and to offer a first class service to them, which includes tax planning, investment management and estate planning. We believe that our customer focused and...
编辑:有时单词中会缺少多个字符)
您会看到它已将“clients”中的“c”切掉。它只发生在文本的开头而不是结尾。我该如何解决这个问题?我相信我已经成功了一半。到目前为止的代码:
function neatest_trim($content, $chars, $searchquery,$characters_before,$characters_after) {
if (strlen($content) > $chars) {
$pos = strpos($content, $searchquery);
$start = $characters_before < $pos ? $pos - $characters_before : 0;
$len = $pos + strlen($searchquery) + $characters_after - $start;
$content = str_replace(' ', ' ', $content);
$content = str_replace("\n", '', $content);
$content = strip_tags(trim($content));
$content = preg_replace('/\s+?(\S+)?$/', '', mb_substr($content, $start, $len));
$content = trim($content) . '...';
$content = strip_tags($content);
$content = str_ireplace($searchquery, '<span class="highlight" style="background: #E6E6E6;">' . $searchquery . '</span>', $content);
}
return $content;
}
$results[] = Array(
'text' => neatest_trim($row->content,200,$searchquery,120,80)
);
I am using the below code to truncate my content before and after the first search keyword in my text (this is for my search page) everything works as it should apart from the code cutting words in half at the beginning of the truncate, it doesn't cut words at the end of the truncate.
Example:
lients at the centre of the relationship and to offer a first class service to them, which includes tax planning, investment management and estate planning. We believe that our customer focused and...
(edit:it is sometimes more than one character missing from the word)
You will see that it has chopped the 'c' off 'clients'. It only happens at the beginning of the text not the end. How can I fix this? I believe I am half way there. code so far:
function neatest_trim($content, $chars, $searchquery,$characters_before,$characters_after) {
if (strlen($content) > $chars) {
$pos = strpos($content, $searchquery);
$start = $characters_before < $pos ? $pos - $characters_before : 0;
$len = $pos + strlen($searchquery) + $characters_after - $start;
$content = str_replace(' ', ' ', $content);
$content = str_replace("\n", '', $content);
$content = strip_tags(trim($content));
$content = preg_replace('/\s+?(\S+)?$/', '', mb_substr($content, $start, $len));
$content = trim($content) . '...';
$content = strip_tags($content);
$content = str_ireplace($searchquery, '<span class="highlight" style="background: #E6E6E6;">' . $searchquery . '</span>', $content);
}
return $content;
}
$results[] = Array(
'text' => neatest_trim($row->content,200,$searchquery,120,80)
);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您在开头保留的 120 个字符不会检查第 120 个字符是空格还是字母,而只是剪切那里的字符串。
我会进行此更改,以搜索距离我们起始位置最近的“空间”。
这样
$start
是空格的位置,而不是单词中的字母。你的函数将变成:
The 120 Characters that you are keeping at the start don't check if the 120th character is a space or a letter, and just cuts the string there no matter what.
I would make this change, to search for the closest "space" to the position we are starting from.
This way
$start
is the position of a space, and not a letter from a word.Your Function would become:
为什么不使用替换正则表达式?
因此,这将修剪关键字 'word' 前后 10 个字符
解释:
因此,此正则表达式的作用是查找您指定的单词(并且仅查找该单词,因为它包含在 \b - 单词边界中)并且它还发现 ant 存储(包括单词)单词之前的 10 个字符以及之后的 10 个字符。您可以使用前后字符变量以及关键字变量自行构建正则表达式。正则表达式还匹配其他所有内容,但替换仅使用反向引用 $1,这就是您想要的输出。
Why just don't use a replace regex ?
So this will trim everything 10 chars before and after the keyword 'word'
Explanation :
So what this regex does is finding the word that you specify (and only that word alone because it is included in \b - word boundaries) and it also find ant stores (including the word) the 10 characters before the word as well as the ten characters after it. You could construct the regex yourself with variables for characters before-after and of course the keyword. The regex also matches everything else but the replacement only uses backreference $1 which is what you want as the output.