我目前正在开发一个搜索应用程序,它使用 Lucene.Net 将数据库中的数据索引到索引文件。我有一个产品目录,其中包含名称、短描述和长描述、sku 和其他字段。使用 StandardAnalyzer 将数据存储在 Index 中。我正在尝试为文本字段添加自动建议,并使用 TermEnum 从索引中获取所有关键字术语及其分数。但返回的项是单项的。例如,如果我输入 co,返回的建议是服装、计数、集合、牛仔、组合等。但我希望建议返回短语。例如,如果我搜索 co,建议应该是牛仔服装、成人服装、密码锁等。
以下是用于获取建议的代码:
public string[] GetKeywords(string strSearchExp)
{
IndexReader rd = IndexReader.Open(mIndexLoc);
TermEnum tenum = rd.Terms(new Term("Name", strSearchExp));
string[] strResult = new string[10];
int i = 0;
Dictionary<string, double> KeywordList = new Dictionary<string, double>();
do
{
//terms = tenum.Term();
if (tenum.Term() != null)
{
//strResult[i] = terms.text.ToString();
KeywordList.Add(tenum.Term().text.ToString(), tenum.DocFreq());
}
} while (tenum.Next() && tenum.Term().text.StartsWith(strSearchExp) && tenum.Term().text.Length > 1);
var sortedDict = (from entry in KeywordList orderby entry.Value descending select entry);
foreach (KeyValuePair<string, double> data in sortedDict)
{
if (data.Key.Length > 1)
{
strResult[i] = data.Key;
i++;
}
if (i >= 10) //Exit the for Loop if the count exceeds 10
break;
}
tenum.Close();
rd.Close();
return strResult;
}
谁能给我指示以实现此目的吗?感谢您对此进行调查。
I am currently working on an search application which uses Lucene.Net to index the data from the database to Index file. I have a product catalog which has Name, short and long description, sku and other fields. The data is stored in Index using StandardAnalyzer. I am trying to add auto suggestion for a text field and using TermEnum to get all the keyword terms and its score from the Index. But the terms returned are of single term. For example, if I type for co, the suggestion returned are costume, count, collection, cowboy, combination etc. But I want the suggestion to return phrases. For exmaple, if I search for co, the suggestions should be cowboy costume, costume for adults, combination locks etc.
The following is the code used to get the suggestions:
public string[] GetKeywords(string strSearchExp)
{
IndexReader rd = IndexReader.Open(mIndexLoc);
TermEnum tenum = rd.Terms(new Term("Name", strSearchExp));
string[] strResult = new string[10];
int i = 0;
Dictionary<string, double> KeywordList = new Dictionary<string, double>();
do
{
//terms = tenum.Term();
if (tenum.Term() != null)
{
//strResult[i] = terms.text.ToString();
KeywordList.Add(tenum.Term().text.ToString(), tenum.DocFreq());
}
} while (tenum.Next() && tenum.Term().text.StartsWith(strSearchExp) && tenum.Term().text.Length > 1);
var sortedDict = (from entry in KeywordList orderby entry.Value descending select entry);
foreach (KeyValuePair<string, double> data in sortedDict)
{
if (data.Key.Length > 1)
{
strResult[i] = data.Key;
i++;
}
if (i >= 10) //Exit the for Loop if the count exceeds 10
break;
}
tenum.Close();
rd.Close();
return strResult;
}
Can anyone please give me directions to achive this? Thanks for looking into this.
发布评论
评论(2)
您可以使用
Field.Index.NOT_ANALYZED
参数或KeywordAnalyzer
,然后对其运行通配符查询或前缀查询。You could simply index your product name in a different field using the
Field.Index.NOT_ANALYZED
parameter or theKeywordAnalyzer
, and then run either a wildcard query or a prefix query on it.正如您所说,“返回的术语是单一术语”。因此,您需要创建由短语组成的术语。
您可以使用内置的 ShingleFilter 标记过滤器来创建短语术语:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/shingle/ShingleFilter.html
您可能希望使用单独的字段这是因为我不确定 ShingleFilter 是否实际上产生单个项 - 您可能想尝试一下。
As you said, "the terms returned are of single term". So you need to create terms that consist of phrases.
You can use the built-in ShingleFilter token filter to create your phrase terms:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/shingle/ShingleFilter.html
You may want to use a separate field for this as I'm not sure whether ShingleFilter actully produces single terms - you'll probably want to experiment with this.