Lucene.net - 如何搜索备用索引键或备用单词组合?

发布于 2024-10-06 10:01:30 字数 2859 浏览 0 评论 0原文

在我的 Lucene 索引中,我有以下键

ID
全文
用户
日期

我已经使用以下方法很好地进行了全文搜索。

Public Function ReadIndex(ByVal q As String, ByVal page As Integer?) As Domain.Pocos.LuceneResults Implements ILuceneService.ReadIndex
    ''# A timer variable to determine now long the method executes for
    Dim tStart As DateTime = DateTime.Now

    ''# Creates a container that we use to store all of the result ID's
    Dim IDList As List(Of Integer) = New List(Of Integer)

    ''# First we set the initial page number. 
    ''# If it's null, it means it's zero
    If page Is Nothing Then page = 0

    ''# [i] is the variable we use to extract the appropriate (needed)
    ''# documents from the results. Its initial value is the page number
    ''# multiplied by the number of results we want to return (in our
    ''# case 10). The [last] variable is used to stop the while loop at
    ''# the 10th record by simply adding 9 to the [i] variable.
    Dim i = page * 10
    Dim last As Integer = i + 9

    ''# Variables used by Lucene
    Dim reader As IndexReader = IndexReader.Open(luceneDirectory)
    Dim searcher As IndexSearcher = New IndexSearcher(reader)
    Dim query As Query = New TermQuery(New Term("fullText", q.ToLower))

    ''# We're using 10,000 as the maximum number of results to return
    ''# because I have a feeling that we'll never reach that full amount
    ''# anyways.  And if we do, who in their right mind is going to page
    ''# through all of the results?
    Dim topDocs As TopDocs = searcher.Search(query, Nothing, 10000)
    Dim doc As Document = Nothing

    ''# loop through the topDocs and grab the appropriate 10 results based
    ''# on the submitted page number
    While i <= last AndAlso i < topDocs.totalHits
        doc = searcher.Doc(topDocs.scoreDocs(i).doc)
        IDList.Add(doc.[Get]("id"))
        i += 1
    End While

    ''# Self explanitory
    searcher.Close()
    Dim EventList As List(Of Domain.Event) = EventService.QueryEvents().Where(Function(e) (IDList.Contains(e.ID))).ToList()

    Dim tStop As DateTime = DateTime.Now
    Dim LucienResults As New Domain.Pocos.LuceneResults With {.EventList = EventList,
                                                              .ExecuteTime = (tStop - tStart),
                                                              .TotalResults = topDocs.totalHits}

    Return LucienResults
End Function

现在我遇到的一个问题是弄清楚如何将用户和日期搜索添加到该方法中。

基本上,如果我搜索“某个事件”,结果会完美显示。但是,如果我搜索 user:joedate:12/07/2100,我不会得到任何结果。

另外,如果我有短语the Quick Brown Fox Jump over the Lazy dogs,并且我搜索Brown Fox,我获得索引结果,但如果我搜索 quick Fox,我不会得到结果。基本上我想将字符串拆分为所有空格并单独搜索每个单词。

我需要向此方法添加什么才能启用特定键和替代单词组合的搜索?

In my Lucene index I have the following keys

id
fullText
user
date

I've got the fullText search working pretty good using the following method.

Public Function ReadIndex(ByVal q As String, ByVal page As Integer?) As Domain.Pocos.LuceneResults Implements ILuceneService.ReadIndex
    ''# A timer variable to determine now long the method executes for
    Dim tStart As DateTime = DateTime.Now

    ''# Creates a container that we use to store all of the result ID's
    Dim IDList As List(Of Integer) = New List(Of Integer)

    ''# First we set the initial page number. 
    ''# If it's null, it means it's zero
    If page Is Nothing Then page = 0

    ''# [i] is the variable we use to extract the appropriate (needed)
    ''# documents from the results. Its initial value is the page number
    ''# multiplied by the number of results we want to return (in our
    ''# case 10). The [last] variable is used to stop the while loop at
    ''# the 10th record by simply adding 9 to the [i] variable.
    Dim i = page * 10
    Dim last As Integer = i + 9

    ''# Variables used by Lucene
    Dim reader As IndexReader = IndexReader.Open(luceneDirectory)
    Dim searcher As IndexSearcher = New IndexSearcher(reader)
    Dim query As Query = New TermQuery(New Term("fullText", q.ToLower))

    ''# We're using 10,000 as the maximum number of results to return
    ''# because I have a feeling that we'll never reach that full amount
    ''# anyways.  And if we do, who in their right mind is going to page
    ''# through all of the results?
    Dim topDocs As TopDocs = searcher.Search(query, Nothing, 10000)
    Dim doc As Document = Nothing

    ''# loop through the topDocs and grab the appropriate 10 results based
    ''# on the submitted page number
    While i <= last AndAlso i < topDocs.totalHits
        doc = searcher.Doc(topDocs.scoreDocs(i).doc)
        IDList.Add(doc.[Get]("id"))
        i += 1
    End While

    ''# Self explanitory
    searcher.Close()
    Dim EventList As List(Of Domain.Event) = EventService.QueryEvents().Where(Function(e) (IDList.Contains(e.ID))).ToList()

    Dim tStop As DateTime = DateTime.Now
    Dim LucienResults As New Domain.Pocos.LuceneResults With {.EventList = EventList,
                                                              .ExecuteTime = (tStop - tStart),
                                                              .TotalResults = topDocs.totalHits}

    Return LucienResults
End Function

Now a problem I'm having is figuring out how to add user and date search to the method.

basically, if I do a search for "some event", the results are displaying perfectly. However if I do a search for user:joe or date:12/07/2100, I don't get any results.

Also, if I have the phrase the quick brown fox jumped over the lazy dogs, and I search for brown fox, I will get the index result, but if I search for quick fox, I wont get results. Basically I'd like to split the string on all spaces and search each word individually.

What do I need to add to this method to enable searching on specific keys and alternate word combinations?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

又怨 2024-10-13 10:01:30

您基本上将“brown Fox”和“quick Fox”作为一个标记进行搜索。您可能想要在空格上进行拆分并构建一个包含多个 TermQuery 字段的 BooleanQuery,或者只是将字符串扔到 QueryParser 中。

您描述的语法“user:joe”语法是默认 QueryParser 将解析为 new TermQuery(new Term("user", "joe")) 的语法,这就是您想要的。您当前的解决方案将搜索单个“user:joe”令牌,大多数分析器会将其分为两个令牌,因此您永远不会与这些分析器匹配。

另外,你不能告诉你的 IndexSearcher.Search 停在你要阅读的最后一个索引处,而不是 10000 处吗?

在此过程中,如果您只对一个字段感兴趣,请不要使用 IndexSearcher.Doc 读取文档实例。使用 FieldCache,它将保留内存中的缓存(通过弱引用索引段读取器),这将允许您快速查找单个术语字段。

最后,看看您正在使用哪种分析仪。有些是特定于其他语言的,有些有同义词或词干支持等。[通常]使搜索更容易使用。

You're basically searching for "brown fox" and "quick fox" as one single token. You probably want to either split on whitespaces and build a BooleanQuery with containing several TermQuery fields, or just throw your string at the QueryParser.

The syntax "user:joe" syntax you describe is what the default QueryParser will parse into a new TermQuery(new Term("user", "joe")), which is what you want. Your current solution will search for a single "user:joe" token, which most analyzer will split up into two tokens, so you will never get a match with those analyzers.

Also, cant you tell your IndexSearcher.Search to stop at the last index you'll be reading, instead of 10000?

And while at it, don't read document instances using IndexSearcher.Doc if you're only interested in one field. Use the FieldCache which will keep an in-memory cache (by weakly referenced index segment readers) which will allow you quick lookups of single termed fields.

And finally, look into which analyzer you're using. Some are specific to other languages, some have synonym or stemming support, etc. Things that [usually] makes a search easier to work with.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文