解析和格式化搜索结果
搜索:
脚本+语言 Web+页面应用程序
结果:
...脚本语言最初...生成动态网页。 它具有...图形应用程序...目的脚本语言,即...创建网页作为输出...< /p>
假设我想要一个值来表示匹配项两侧允许填充的字符数,另一个值表示结果中将显示多少个匹配项(即,我只想查看前 5 个匹配项,什么都看不到)更多的)。
您具体会如何做这件事?
这与语言无关,但我将在 PHP 环境中实现该解决方案,因此请将答案限制为不需要特定语言或框架的选项。
这是我的思考过程:根据搜索词创建一个数组。 确定哪个搜索词在文章正文中的位置方面具有最低索引。 将正文的该部分收集到另一个变量中,然后从文章正文中删除该部分。 返回到步骤 1。您甚至可以为每个单词添加一个计数器,当计数器达到 3 左右时跳过它。
重要提示:
解决方案必须以非线性方式匹配所有搜索词。 意思是,如果第一项存在于第二项之后,则应在第二项之后找到它。 同样,它也应该在第 3 学期之后找到。 如果第 3 项恰好存在于第 1 项和第 2 项之前,则应在第 1 项和第 2 项之前找到第 3 项。
该解决方案应该允许我声明“每个术语最多允许三个匹配,然后终止摘要”。
额外加分:
获取填充变量以选择性地填充单词,而不是字符。
Search:
Scripting+Language Web+Pages Applications
Results:
...scripting language originally...producing dynamic web pages. It has...graphical applications....purpose scripting language that is...d creating web pages as output...
Suppose I want a value that represents the amount of characters to allow as padding on either side of the matched terms, and another value that represents how many matches will be shown in the result (ie, I want to see only the first 5 matches, nothing more).
How exactly would you go about doing this?
This is pretty language-agnostic, but I will be implementing the solution in a PHP environment, so please restrict answers to options that do not require a specific language or framework.
Here's my thought process: create an array from the search words. Determine which search word has the lowest index regarding where it's found in the article-body. Gather that portion of the body into another variable, and then remove that section from the article-body. Return to step 1. You might even add a counter to each word, skipping it when the counter reaches 3 or so.
Important:
The solution must match all search terms in a non-linear fashion. Meaning, term one should be found after term two if it exists after term two. Likewise, it should be found after term 3 as well. Term 3 should be found before term 1 and 2, if it happens to exist before them.
The solution should allow me to declare "Only allow up to three matches for each term, then terminate the summary."
Extra Credit:
Get the padding-variable to optionally pad words, rather than chars.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我的思考过程:
array
对象中支持此)伪代码,或者我的最佳尝试:
My thought process:
array
object)Pseudocode, or my best attempt at it:
就我个人而言,我会将搜索词转换为正则表达式,然后使用正则表达式查找替换将匹配项包装在强标记中以进行格式化。
最有可能的是,RegEx 路线是您最好的选择。 因此,在您的示例中,您最终会得到三个单独的 RegEx 值。
由于您想要一个不依赖于语言的解决方案,因此我不会将实际的表达式放在这里,因为确切的语法因语言而异。
Personally I would convert the search terms into Regular Expressions and then use a Regex Find-Replace to wrap the matches in strong tags for the formatting.
Most likely the RegEx route would be you best bet. So in your example, you would end up getting three separate RegEx values.
Since you want a non-language dependent solution I will not put the actual expressions here as the exact syntax varies by language.