截断文本中搜索关键字之前的内容

发布于 2024-12-13 01:08:09 字数 1531 浏览 3 评论 0原文

我使用下面的代码在文本中的第一个搜索关键字之前和之后截断我的内容（这是针对我的搜索页面），除了在截断开始时将单词切成两半的代码之外，一切正常不要在截断末尾截断单词。

示例：（

lients at the centre of the relationship and to offer a first class service to them, which includes tax planning, investment management and estate planning. We believe that our customer focused and...

编辑：有时单词中会缺少多个字符）

您会看到它已将“clients”中的“c”切掉。它只发生在文本的开头而不是结尾。我该如何解决这个问题？我相信我已经成功了一半。到目前为止的代码：

function neatest_trim($content, $chars, $searchquery,$characters_before,$characters_after) {
            if (strlen($content) > $chars) {
                 $pos = strpos($content, $searchquery);
                 $start = $characters_before < $pos ? $pos - $characters_before : 0;
                $len = $pos + strlen($searchquery) + $characters_after - $start;
                $content = str_replace('&nbsp;', ' ', $content);
                $content = str_replace("\n", '', $content);
                $content = strip_tags(trim($content));
                $content = preg_replace('/\s+?(\S+)?$/', '', mb_substr($content, $start, $len));
                $content = trim($content) . '...';
                $content = strip_tags($content);
                $content = str_ireplace($searchquery, '<span class="highlight" style="background: #E6E6E6;">' . $searchquery . '</span>', $content);
            }
            return $content;
        }



 $results[] = Array(
  'text' => neatest_trim($row->content,200,$searchquery,120,80)
            );

原文

I am using the below code to truncate my content before and after the first search keyword in my text (this is for my search page) everything works as it should apart from the code cutting words in half at the beginning of the truncate, it doesn't cut words at the end of the truncate.

Example:

lients at the centre of the relationship and to offer a first class service to them, which includes tax planning, investment management and estate planning. We believe that our customer focused and...

(edit:it is sometimes more than one character missing from the word)

You will see that it has chopped the 'c' off 'clients'. It only happens at the beginning of the text not the end. How can I fix this? I believe I am half way there. code so far:

function neatest_trim($content, $chars, $searchquery,$characters_before,$characters_after) {
            if (strlen($content) > $chars) {
                 $pos = strpos($content, $searchquery);
                 $start = $characters_before < $pos ? $pos - $characters_before : 0;
                $len = $pos + strlen($searchquery) + $characters_after - $start;
                $content = str_replace(' ', ' ', $content);
                $content = str_replace("\n", '', $content);
                $content = strip_tags(trim($content));
                $content = preg_replace('/\s+?(\S+)?$/', '', mb_substr($content, $start, $len));
                $content = trim($content) . '...';
                $content = strip_tags($content);
                $content = str_ireplace($searchquery, '<span class="highlight" style="background: #E6E6E6;">' . $searchquery . '</span>', $content);
            }
            return $content;
        }



 $results[] = Array(
  'text' => neatest_trim($row->content,200,$searchquery,120,80)
            );

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

兮颜 2024-12-20 01:08:10

您在开头保留的 120 个字符不会检查第 120 个字符是空格还是字母，而只是剪切那里的字符串。

我会进行此更改，以搜索距离我们起始位置最近的“空间”。

$start = $characters_before < $pos ? $pos - $characters_before : 0;
// add this line:
$start = strpos($content, ' ', $start);
$len = $pos + strlen($searchquery) + $characters_after - $start;

这样 $start 是空格的位置，而不是单词中的字母。

你的函数将变成：

function neatest_trim($content, $chars, $searchquery,$characters_before,$characters_after) {
    if (strlen($content) > $chars) {
    $pos = strpos($content, $searchquery);
    $start = $characters_before < $pos ? $pos - $characters_before : 0;
    $start = strpos($content, " ", $start);
    $len = $pos + strlen($searchquery) + $characters_after - $start;
    $content = str_replace(' ', ' ', $content);
    $content = str_replace("\n", '', $content);
    $content = strip_tags(trim($content));
    $content = preg_replace('/\s+?(\S+)?$/', '', mb_substr($content, $start, $len));
    $content = trim($content) . '...';
    $content = strip_tags($content);
    $content = str_ireplace($searchquery, '<span class="highlight" style="background: #E6E6E6;">' . $searchquery . '</span>', $content);
    }
    return $content;
  }

The 120 Characters that you are keeping at the start don't check if the 120th character is a space or a letter, and just cuts the string there no matter what.

I would make this change, to search for the closest "space" to the position we are starting from.

$start = $characters_before < $pos ? $pos - $characters_before : 0;
// add this line:
$start = strpos($content, ' ', $start);
$len = $pos + strlen($searchquery) + $characters_after - $start;

This way $start is the position of a space, and not a letter from a word.

Your Function would become:

function neatest_trim($content, $chars, $searchquery,$characters_before,$characters_after) {
    if (strlen($content) > $chars) {
    $pos = strpos($content, $searchquery);
    $start = $characters_before < $pos ? $pos - $characters_before : 0;
    $start = strpos($content, " ", $start);
    $len = $pos + strlen($searchquery) + $characters_after - $start;
    $content = str_replace(' ', ' ', $content);
    $content = str_replace("\n", '', $content);
    $content = strip_tags(trim($content));
    $content = preg_replace('/\s+?(\S+)?$/', '', mb_substr($content, $start, $len));
    $content = trim($content) . '...';
    $content = strip_tags($content);
    $content = str_ireplace($searchquery, '<span class="highlight" style="background: #E6E6E6;">' . $searchquery . '</span>', $content);
    }
    return $content;
  }

回复收藏 0 原文

眼藏柔 2024-12-20 01:08:10

为什么不使用替换正则表达式？

$result = preg_replace('/.*(.{10}\bword\b.{10}).*/s', '$1', $subject);

因此，这将修剪关键字 'word' 前后 10 个字符

解释：

# .*(.{10}\bword\b.{10}).*
# 
# Options: dot matches newline
# 
# Match any single character «.*»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the regular expression below and capture its match into backreference number 1 «(.{10}\bword\b.{10})»
#    Match any single character «.{10}»
#       Exactly 10 times «{10}»
#    Assert position at a word boundary «\b»
#    Match the characters “word” literally «word»
#    Assert position at a word boundary «\b»
#    Match any single character «.{10}»
#       Exactly 10 times «{10}»
# Match any single character «.*»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»

因此，此正则表达式的作用是查找您指定的单词（并且仅查找该单词，因为它包含在 \b - 单词边界中）并且它还发现 ant 存储（包括单词）单词之前的 10 个字符以及之后的 10 个字符。您可以使用前后字符变量以及关键字变量自行构建正则表达式。正则表达式还匹配其他所有内容，但替换仅使用反向引用 $1，这就是您想要的输出。

Why just don't use a replace regex ?

$result = preg_replace('/.*(.{10}\bword\b.{10}).*/s', '$1', $subject);

So this will trim everything 10 chars before and after the keyword 'word'

Explanation :

# .*(.{10}\bword\b.{10}).*
# 
# Options: dot matches newline
# 
# Match any single character «.*»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
# Match the regular expression below and capture its match into backreference number 1 «(.{10}\bword\b.{10})»
#    Match any single character «.{10}»
#       Exactly 10 times «{10}»
#    Assert position at a word boundary «\b»
#    Match the characters “word” literally «word»
#    Assert position at a word boundary «\b»
#    Match any single character «.{10}»
#       Exactly 10 times «{10}»
# Match any single character «.*»
#    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»

So what this regex does is finding the word that you specify (and only that word alone because it is included in \b - word boundaries) and it also find ant stores (including the word) the 10 characters before the word as well as the ten characters after it. You could construct the regex yourself with variables for characters before-after and of course the keyword. The regex also matches everything else but the replacement only uses backreference $1 which is what you want as the output.

回复收藏 0 原文

~没有更多了~