Lucene.Net:如何向搜索结果添加日期过滤器?
我的搜索器工作得非常好,但是它确实倾向于返回过时的结果。我的网站很像 NerdDinner,过去的事件变得无关紧要。
我目前正在像这样建立索引
注意:我的示例是在 VB.NET 中,但我不在乎示例是否是在 C# 中给出的
Public Function AddIndex(ByVal searchableEvent As [Event]) As Boolean Implements ILuceneService.AddIndex
Dim writer As New IndexWriter(luceneDirectory, New StandardAnalyzer(), False)
Dim doc As Document = New Document
doc.Add(New Field("id", searchableEvent.ID, Field.Store.YES, Field.Index.UN_TOKENIZED))
doc.Add(New Field("fullText", FullTextBuilder(searchableEvent), Field.Store.YES, Field.Index.TOKENIZED))
doc.Add(New Field("user", If(searchableEvent.User.UserName = Nothing,
"User" & searchableEvent.User.ID,
searchableEvent.User.UserName),
Field.Store.YES,
Field.Index.TOKENIZED))
doc.Add(New Field("title", searchableEvent.Title, Field.Store.YES, Field.Index.TOKENIZED))
doc.Add(New Field("location", searchableEvent.Location.Name, Field.Store.YES, Field.Index.TOKENIZED))
doc.Add(New Field("date", searchableEvent.EventDate, Field.Store.YES, Field.Index.UN_TOKENIZED))
writer.AddDocument(doc)
writer.Optimize()
writer.Close()
Return True
End Function
请注意我如何拥有一个存储事件日期的“日期”索引。
我的搜索看起来像这样,
''# code omitted
Dim reader As IndexReader = IndexReader.Open(luceneDirectory)
Dim searcher As IndexSearcher = New IndexSearcher(reader)
Dim parser As QueryParser = New QueryParser("fullText", New StandardAnalyzer())
Dim query As Query = parser.Parse(q.ToLower)
''# We're using 10,000 as the maximum number of results to return
''# because I have a feeling that we'll never reach that full amount
''# anyways. And if we do, who in their right mind is going to page
''# through all of the results?
Dim topDocs As TopDocs = searcher.Search(query, Nothing, 10000)
Dim doc As Document = Nothing
''# loop through the topDocs and grab the appropriate 10 results based
''# on the submitted page number
While i <= last AndAlso i < topDocs.totalHits
doc = searcher.Doc(topDocs.scoreDocs(i).doc)
IDList.Add(doc.[Get]("id"))
i += 1
End While
''# code omitted
我确实尝试了以下操作,但无济于事(抛出 NullReferenceException)。
While i <= last AndAlso i < topDocs.totalHits
If Date.Parse(doc.[Get]("date")) >= Date.Today Then
doc = searcher.Doc(topDocs.scoreDocs(i).doc)
IDList.Add(doc.[Get]("id"))
i += 1
End If
End While
我还找到了以下文档,但我无法理解它
http://lucene.apache.org/java /1_4_3/api/org/apache/lucene/search/DateFilter.html
I've got my searcher working really well, however it does tend to return results that are obsolete. My site is much like NerdDinner whereby events in the past become irrelevant.
I'm currently indexing like this
note: my example is in VB.NET, but I don't care if examples are given in C#
Public Function AddIndex(ByVal searchableEvent As [Event]) As Boolean Implements ILuceneService.AddIndex
Dim writer As New IndexWriter(luceneDirectory, New StandardAnalyzer(), False)
Dim doc As Document = New Document
doc.Add(New Field("id", searchableEvent.ID, Field.Store.YES, Field.Index.UN_TOKENIZED))
doc.Add(New Field("fullText", FullTextBuilder(searchableEvent), Field.Store.YES, Field.Index.TOKENIZED))
doc.Add(New Field("user", If(searchableEvent.User.UserName = Nothing,
"User" & searchableEvent.User.ID,
searchableEvent.User.UserName),
Field.Store.YES,
Field.Index.TOKENIZED))
doc.Add(New Field("title", searchableEvent.Title, Field.Store.YES, Field.Index.TOKENIZED))
doc.Add(New Field("location", searchableEvent.Location.Name, Field.Store.YES, Field.Index.TOKENIZED))
doc.Add(New Field("date", searchableEvent.EventDate, Field.Store.YES, Field.Index.UN_TOKENIZED))
writer.AddDocument(doc)
writer.Optimize()
writer.Close()
Return True
End Function
Notice how I have a "date" index that stores the event date.
My search then looks like this
''# code omitted
Dim reader As IndexReader = IndexReader.Open(luceneDirectory)
Dim searcher As IndexSearcher = New IndexSearcher(reader)
Dim parser As QueryParser = New QueryParser("fullText", New StandardAnalyzer())
Dim query As Query = parser.Parse(q.ToLower)
''# We're using 10,000 as the maximum number of results to return
''# because I have a feeling that we'll never reach that full amount
''# anyways. And if we do, who in their right mind is going to page
''# through all of the results?
Dim topDocs As TopDocs = searcher.Search(query, Nothing, 10000)
Dim doc As Document = Nothing
''# loop through the topDocs and grab the appropriate 10 results based
''# on the submitted page number
While i <= last AndAlso i < topDocs.totalHits
doc = searcher.Doc(topDocs.scoreDocs(i).doc)
IDList.Add(doc.[Get]("id"))
i += 1
End While
''# code omitted
I did try the following, but it was to no avail (threw a NullReferenceException).
While i <= last AndAlso i < topDocs.totalHits
If Date.Parse(doc.[Get]("date")) >= Date.Today Then
doc = searcher.Doc(topDocs.scoreDocs(i).doc)
IDList.Add(doc.[Get]("id"))
i += 1
End If
End While
I also found the following documentation, but I can't make heads or tails of it
http://lucene.apache.org/java/1_4_3/api/org/apache/lucene/search/DateFilter.html
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您正在链接到 Lucene 1.4.3 的 api 文档。 Lucene.Net 目前版本为 2.9.2。我认为升级是时候了。
首先,您经常使用 Store.Yes。存储字段将使您的索引变大,这可能是一个性能问题。通过将日期存储为“yyyyMMddHHmmssfff”格式的字符串(这是非常高分辨率的,低至毫秒),可以轻松解决日期问题。您可能希望降低分辨率以创建更少的令牌,从而减小索引大小。
然后,您对搜索应用过滤器(第二个参数,当前在其中传递 Nothing/null)。
您可以使用 BooleanQuery 将普通查询与 RangeQuery 结合起来来执行此操作,但这也会影响评分(根据查询而不是过滤器计算)。为了简单起见,您可能还希望避免修改查询,以便知道执行了什么查询。
You're linking to the api documentation of Lucene 1.4.3. Lucene.Net is currently at 2.9.2. I think an upgrade is due.
First, you're using Store.Yes alot. Stored fields will make your index larger, which may be a performance issue. Your date problem can easily be solved by storing dates as strings in the format of "yyyyMMddHHmmssfff" (that's really high resolution, down to milliseconds). You may want to reduce the resolution to create fewer tokens to reduce your index size.
Then you apply a filter to your search (the second parameter, where you currently pass in Nothing/null).
You can do this using a BooleanQuery combining your normal query with a RangeQuery, but that would also affect scoring (which is calculated on the query, not the filter). You may also want to avoid modifying the query for simplicity, so you know what query is executed.
您可以使用
BooleanQuery
组合多个查询。由于 Lucene 仅搜索文本,请注意索引中的日期字段必须按日期的最高有效部分到最低有效部分排序,即采用 IS8601 格式(“2010-11-02T20:49:16.000000+00:00”)示例:
或者,如果通配符不够精确,您可以添加
RangeQuery
来代替:You can combine multiple queries with a
BooleanQuery
. Since Lucene only searches text note that the date field in your index must be ordered by the most significant to the least significant part of the date, i.e. in IS8601 format ("2010-11-02T20:49:16.000000+00:00")Example:
Alternatively if a wildcard is not precise enough you can add a
RangeQuery
instead: