使用 Lucene.Net 进行多词自动建议

发布于 2024-09-04 15:04:01 字数 1335 浏览 3 评论 0 原文

我目前正在开发一个搜索应用程序,它使用 Lucene.Net 将数据库中的数据索引到索引文件。我有一个产品目录,其中包含名称、短描述和长描述、sku 和其他字段。使用 StandardAnalyzer 将数据存储在 Index 中。我正在尝试为文本字段添加自动建议,并使用 TermEnum 从索引中获取所有关键字术语及其分数。但返回的项是单项的。例如,如果我输入 co,返回的建议是服装、计数、集合、牛仔、组合等。但我希望建议返回短语。例如,如果我搜索 co,建议应该是牛仔服装、成人服装、密码锁等。

以下是用于获取建议的代码:

public string[] GetKeywords(string strSearchExp)
{

IndexReader rd = IndexReader.Open(mIndexLoc);
TermEnum tenum = rd.Terms(new Term("Name", strSearchExp));
string[] strResult = new string[10];
int i = 0;
Dictionary<string, double> KeywordList = new Dictionary<string, double>();
do
{
    //terms = tenum.Term();
    if (tenum.Term() != null)
    {
        //strResult[i] = terms.text.ToString();
        KeywordList.Add(tenum.Term().text.ToString(), tenum.DocFreq());
    }
} while (tenum.Next() && tenum.Term().text.StartsWith(strSearchExp) && tenum.Term().text.Length > 1);

var sortedDict = (from entry in KeywordList orderby entry.Value descending select entry);

foreach (KeyValuePair<string, double> data in sortedDict)
{
    if (data.Key.Length > 1)
    {
        strResult[i] = data.Key;
        i++;
    }
    if (i >= 10)    //Exit the for Loop if the count exceeds 10
        break;
}
tenum.Close();
rd.Close();
return strResult;

}

谁能给我指示以实现此目的吗?感谢您对此进行调查。

I am currently working on an search application which uses Lucene.Net to index the data from the database to Index file. I have a product catalog which has Name, short and long description, sku and other fields. The data is stored in Index using StandardAnalyzer. I am trying to add auto suggestion for a text field and using TermEnum to get all the keyword terms and its score from the Index. But the terms returned are of single term. For example, if I type for co, the suggestion returned are costume, count, collection, cowboy, combination etc. But I want the suggestion to return phrases. For exmaple, if I search for co, the suggestions should be cowboy costume, costume for adults, combination locks etc.

The following is the code used to get the suggestions:

public string[] GetKeywords(string strSearchExp)
{

IndexReader rd = IndexReader.Open(mIndexLoc);
TermEnum tenum = rd.Terms(new Term("Name", strSearchExp));
string[] strResult = new string[10];
int i = 0;
Dictionary<string, double> KeywordList = new Dictionary<string, double>();
do
{
    //terms = tenum.Term();
    if (tenum.Term() != null)
    {
        //strResult[i] = terms.text.ToString();
        KeywordList.Add(tenum.Term().text.ToString(), tenum.DocFreq());
    }
} while (tenum.Next() && tenum.Term().text.StartsWith(strSearchExp) && tenum.Term().text.Length > 1);

var sortedDict = (from entry in KeywordList orderby entry.Value descending select entry);

foreach (KeyValuePair<string, double> data in sortedDict)
{
    if (data.Key.Length > 1)
    {
        strResult[i] = data.Key;
        i++;
    }
    if (i >= 10)    //Exit the for Loop if the count exceeds 10
        break;
}
tenum.Close();
rd.Close();
return strResult;

}

Can anyone please give me directions to achive this? Thanks for looking into this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

岁月苍老的讽刺 2024-09-11 15:04:01

您可以使用 Field.Index.NOT_ANALYZED 参数或 KeywordAnalyzer,然后对其运行通配符查询或前缀查询。

You could simply index your product name in a different field using the Field.Index.NOT_ANALYZED parameter or the KeywordAnalyzer, and then run either a wildcard query or a prefix query on it.

魄砕の薆 2024-09-11 15:04:01

正如您所说,“返回的术语是单一术语”。因此,您需要创建由短语组成的术语。

您可以使用内置的 ShingleFilter 标记过滤器来创建短语术语:

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/shingle/ShingleFilter.html

您可能希望使用单独的字段这是因为我不确定 ShingleFilter 是否实际上产生单个项 - 您可能想尝试一下。

As you said, "the terms returned are of single term". So you need to create terms that consist of phrases.

You can use the built-in ShingleFilter token filter to create your phrase terms:

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/shingle/ShingleFilter.html

You may want to use a separate field for this as I'm not sure whether ShingleFilter actully produces single terms - you'll probably want to experiment with this.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文