用于从关键字检索相似单词的 API?
我正在用 C# 编写一个搜索引擎,从 SQL 数据库中检索行。我希望搜索也包含类似的单词 - 例如,如果用户搜索“投资”,搜索也会返回“投资”的匹配项,或者如果用户搜索“金融”,搜索也会返回与“金融”匹配。
如何从搜索关键字中检索类似的单词?
I'm writing a search engine in C#, retrieving rows from a SQL database. I'd like the search to also include similar words - for example, if a user searches for "investing", the search will also return matches for "investment", or if the user searches for "financial", the search will also return matches for "finance".
How can I retrieve similar words such as these from a search keyword?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您想要完成的任务称为“词干提取”。阅读维基百科文章了解更多信息:
http://en.wikipedia.org/wiki/Stemming
What you're trying to accomplish is known as "Stemming". Read the Wikipedia article for more info:
http://en.wikipedia.org/wiki/Stemming
您要寻找的是词干。您可能想查看 Lucene.net 中提供的内容...尽管SQL Server 也可能通过全文索引本身支持此功能。事实上,它看起来像这样,给定 这篇文章。
What you're looking for is stemming. You may want to look at what's available in Lucene.net... although it's also possible that SQL Server supports this natively with full text indexing. Indeed, it looks like it, given this article.
如果您使用的是 SQL Server,则可以利用 FREETEXT 搜索,它支持词干提取:
上面的内容在所有列中搜索单词invest的所有形式。它相当于:
这是一个 MSDN 文章 包含更多示例和文档。
If you're using SQL Server you can take advantage of the FREETEXT search, which supports stemming:
The above searches all columns for all forms of the word invest. It's equivalent to:
Here's an MSDN article with more examples and documentation.
此外,soundex 搜索可以帮助查找具有相似语音的匹配项。 SQL Server SOUNDEX() 函数 支持此功能。 .NET 似乎没有内置它,但 CodeProject 有 几种实现。
Additionally, soundex searching can help find matches with similar phonetics. This is supported in SQL Server SOUNDEX() function. .NET doesn't appear the have it built-in, but CodeProject has several implementations.