在 Lucene.Net 索引中搜索 url 字段

发布于 2025-01-04 16:05:46 字数 1646 浏览 3 评论 0原文

我想在 Lucene.net 索引中搜索存储的 url 字段。我的代码如下:

Field urlField = new Field("Url", url.ToLower(), Field.Store.YES,Field.Index.TOKENIZED);
document.Add(urlField);`
indexWriter.AddDocument(document);

我正在使用上面的代码写入索引。

下面的代码用于在索引中搜索 Url。

Lucene.Net.Store.Directory _directory = FSDirectory.GetDirectory(Host, false);
IndexReader reader = IndexReader.Open(_directory);
KeywordAnalyzer _analyzer = new KeywordAnalyzer();
IndexSearcher indexSearcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("Url", _analyzer);
Query query = parser.Parse("\"" + downloadDoc.Uri.ToString() + "\"");
TopDocs hits = indexSearcher.Search(query, null, 10);
if (hits.totalHits > 0)
{
    //statements....
}

但是每当我搜索一个网址(例如:http://www.xyz.com/)时,我都没有得到任何点击。

不知何故,找到了替代方案。但这适用于索引中只有一个文档的情况。如果有更多文档,下面的代码将不会产生正确的结果。有什么想法吗?请帮助

在编写索引时,使用 KeywordAnalyzer()

KeywordAnalyzer _analyzer = new KeywordAnalyzer();    
indexWriter = new IndexWriter(_directory, _analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);

然后在搜索时,使用 KeywordAnalyzer()

IndexReader reader = IndexReader.Open(_directory);
KeywordAnalyzer _analyzer = new KeywordAnalyzer();
IndexSearcher indexSearcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("Url", _analyzer);
Query query = parser.Parse("\"" + url.ToString() + "\"");                    
TopDocs hits = indexSearcher.Search(query, null, 1);

这是因为 KeywordAnalyzer 将整个流“标记化”为 单个令牌。

请帮忙。其紧急。

干杯 苏尼尔...

I want to search a Lucene.net index for a stored url field. My code is given below:

Field urlField = new Field("Url", url.ToLower(), Field.Store.YES,Field.Index.TOKENIZED);
document.Add(urlField);`
indexWriter.AddDocument(document);

I am using the above code for writing into the index.

And the below code to search the Url in the index.

Lucene.Net.Store.Directory _directory = FSDirectory.GetDirectory(Host, false);
IndexReader reader = IndexReader.Open(_directory);
KeywordAnalyzer _analyzer = new KeywordAnalyzer();
IndexSearcher indexSearcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("Url", _analyzer);
Query query = parser.Parse("\"" + downloadDoc.Uri.ToString() + "\"");
TopDocs hits = indexSearcher.Search(query, null, 10);
if (hits.totalHits > 0)
{
    //statements....
}

But whenever I search for a url for example: http://www.xyz.com/, I am not getting any hits.

Somehow, figured out the alternative. But this works in case of only one document in the index. If there are more documents, the below code will not yield correct result. Any ideas? Pls help

While writing the index, use KeywordAnalyzer()

KeywordAnalyzer _analyzer = new KeywordAnalyzer();    
indexWriter = new IndexWriter(_directory, _analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);

Then while searching also, use KeywordAnalyzer()

IndexReader reader = IndexReader.Open(_directory);
KeywordAnalyzer _analyzer = new KeywordAnalyzer();
IndexSearcher indexSearcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("Url", _analyzer);
Query query = parser.Parse("\"" + url.ToString() + "\"");                    
TopDocs hits = indexSearcher.Search(query, null, 1);

This is because the KeywordAnalyzer "Tokenizes" the entire stream as a
single token.

Please help. Its urgent.

Cheers
Sunil...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

将军与妓 2025-01-11 16:05:46

这对我有用:

 IndexReader reader = IndexReader.Open(_directory);                
 IndexSearcher indexSearcher = new IndexSearcher(reader);
 TermQuery tq= new TermQuery(new Term("Url", downloadDoc.Uri.ToString().ToLower()));                
 BooleanQuery bq = new BooleanQuery();
 bq.Add(tq, BooleanClause.Occur.SHOULD);
 TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);

写入索引时使用StandardAnalyzer。

这个答案帮助了我:Lucene search by URL

This worked for me:

 IndexReader reader = IndexReader.Open(_directory);                
 IndexSearcher indexSearcher = new IndexSearcher(reader);
 TermQuery tq= new TermQuery(new Term("Url", downloadDoc.Uri.ToString().ToLower()));                
 BooleanQuery bq = new BooleanQuery();
 bq.Add(tq, BooleanClause.Occur.SHOULD);
 TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);

Use StandardAnalyzer while writing into the index.

This answer helped me: Lucene search by URL

与风相奔跑 2025-01-11 16:05:46

尝试在查询周围加上引号,例如。像这样 :

“http://www.google.com/”

try putting quotes around query, eg. like this :

"http://www.google.com/"

她比我温柔 2025-01-11 16:05:46

使用空格或关键字分析器应该可以。

真的有人会搜索“http://www.Google.com”吗?用户似乎更有可能搜索“Google”。

如果 URL 部分匹配,您始终可以返回整个 URL。我认为标准分析器应该更适合搜索和检索 URL。

Using the whitespace or keyword analyzer should work.

Would anyone actually search for "http://www.Google.com"? Seems more likely that a user would search for "Google" instead.

You can always return the entire URL if their is a partial match. I think the standard analyzer should be more appropriate for searching and retrieving a URL.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文