解析和格式化搜索结果

发布于 2024-07-14 04:52:19 字数 901 浏览 6 评论 0原文

搜索:

脚本+语言 Web+页面应用程序

结果:

...脚本语言最初...生成动态网页。 它具有...图形应用程序...目的脚本语言,即...创建网页作为输出...< /p>

假设我想要一个值来表示匹配项两侧允许填充的字符数,另一个值表示结果中将显示多少个匹配项(即,我只想查看前 5 个匹配项,什么都看不到)更多的)。

具体会如何做这件事?

这与语言无关,但我将在 PHP 环境中实现该解决方案,因此请将答案限制为不需要特定语言或框架的选项。

这是我的思考过程:根据搜索词创建一个数组。 确定哪个搜索词在文章正文中的位置方面具有最低索引。 将正文的该部分收集到另一个变量中,然后从文章正文中删除该部分。 返回到步骤 1。您甚至可以为每个单词添加一个计数器,当计数器达到 3 左右时跳过它。

重要提示:

解决方案必须以非线性方式匹配所有搜索词。 意思是,如果第一项存在于第二项之后,则应在第二项之后找到它。 同样,它也应该在第 3 学期之后找到。 如果第 3 项恰好存在于第 1 项和第 2 项之前,则应在第 1 项和第 2 项之前找到第 3 项。

该解决方案应该允许我声明“每个术语最多允许三个匹配,然后终止摘要”。

额外加分:

获取填充变量以选择性地填充单词,而不是字符。

Search:

Scripting+Language Web+Pages Applications

Results:

...scripting language originally...producing dynamic web pages. It has...graphical applications....purpose scripting language that is...d creating web pages as output...

Suppose I want a value that represents the amount of characters to allow as padding on either side of the matched terms, and another value that represents how many matches will be shown in the result (ie, I want to see only the first 5 matches, nothing more).

How exactly would you go about doing this?

This is pretty language-agnostic, but I will be implementing the solution in a PHP environment, so please restrict answers to options that do not require a specific language or framework.

Here's my thought process: create an array from the search words. Determine which search word has the lowest index regarding where it's found in the article-body. Gather that portion of the body into another variable, and then remove that section from the article-body. Return to step 1. You might even add a counter to each word, skipping it when the counter reaches 3 or so.

Important:

The solution must match all search terms in a non-linear fashion. Meaning, term one should be found after term two if it exists after term two. Likewise, it should be found after term 3 as well. Term 3 should be found before term 1 and 2, if it happens to exist before them.

The solution should allow me to declare "Only allow up to three matches for each term, then terminate the summary."

Extra Credit:

Get the padding-variable to optionally pad words, rather than chars.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

止于盛夏 2024-07-21 04:52:20

我的思考过程:

  1. 创建一个支持非唯一名称/值对的结果数组(PHP 在其标准 array 对象中支持此)
  2. 循环遍历每个搜索词并在搜索文本中找到其字符起始位置
  3. 添加结果数组中的一个项目,用于存储刚刚找到的字符位置,以实际搜索词作为键
  4. 当您找到所有搜索词后,按值(搜索词的字符位置)升序对数组进行排序
  5. 现在,搜索结果将按照在搜索文本中找到的顺序
  6. 排列 循环遍历结果数组并使用指定的单词填充来获取搜索词每一侧的单词,同时还跟踪单独名称/值中的单词计数配对

伪代码,或者我的最佳尝试:

function string GetSearchExcerpt(searchText, searchTerms, wordPadding = 0, searchLimit = 3)
{
  results = new array()
  startIndex = 0
  foreach (searchTerm in searchTerms) 
  {
    charIndex = searchText.FindByIndex(searchTerms, startIndex) // finds 1st position of searchTerm starting at startIndex
    results.Add(searchTerm, charIndex)
    startIndex = charIndex + 1
  }
  results = results.SortByValue()
  lastSearchTerm = ""
  searchTermCount = new array()
  outputText = ""
  foreach (searchTerm => charIndex in results)
  {
    searchTermCount[searchTerm]++
    if (searchTermCount[searchTerm] <= searchLimit)
    {
      // WordPadding is a simple function that moves left or right a given number of words starting at a specified character index and returns those words
      outputText += "..." + WordPadding(-wordPadding, charIndex) + "<strong>" + searchTerm + "</strong>" + WordPadding(wordPadding, charIndex)
    }
  }

  return outputText
}

My thought process:

  1. Create a results array that supports non-unique name/value pairs (PHP supports this in its standard array object)
  2. Loop through each search term and find its character starting position in the search text
  3. Add an item to the results array that stores this character position it has just found with the actual search term as the key
  4. When you've found all the search terms, sort the array ascending by value (the character position of the search term)
  5. Now, the search results will be in order that they were found in the search text
  6. Loop through the results array and use the specified word padding to get words on each side of the search term while also keeping track of the word count in a separate name/value pair

Pseudocode, or my best attempt at it:

function string GetSearchExcerpt(searchText, searchTerms, wordPadding = 0, searchLimit = 3)
{
  results = new array()
  startIndex = 0
  foreach (searchTerm in searchTerms) 
  {
    charIndex = searchText.FindByIndex(searchTerms, startIndex) // finds 1st position of searchTerm starting at startIndex
    results.Add(searchTerm, charIndex)
    startIndex = charIndex + 1
  }
  results = results.SortByValue()
  lastSearchTerm = ""
  searchTermCount = new array()
  outputText = ""
  foreach (searchTerm => charIndex in results)
  {
    searchTermCount[searchTerm]++
    if (searchTermCount[searchTerm] <= searchLimit)
    {
      // WordPadding is a simple function that moves left or right a given number of words starting at a specified character index and returns those words
      outputText += "..." + WordPadding(-wordPadding, charIndex) + "<strong>" + searchTerm + "</strong>" + WordPadding(wordPadding, charIndex)
    }
  }

  return outputText
}
但可醉心 2024-07-21 04:52:20

就我个人而言,我会将搜索词转换为正则表达式,然后使用正则表达式查找替换将匹配项包装在强标记中以进行格式化。

最有可能的是,RegEx 路线是您最好的选择。 因此,在您的示例中,您最终会得到三个单独的 RegEx 值。

由于您想要一个不依赖于语言的解决方案,因此我不会将实际的表达式放在这里,因为确切的语法因语言而异。

Personally I would convert the search terms into Regular Expressions and then use a Regex Find-Replace to wrap the matches in strong tags for the formatting.

Most likely the RegEx route would be you best bet. So in your example, you would end up getting three separate RegEx values.

Since you want a non-language dependent solution I will not put the actual expressions here as the exact syntax varies by language.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文