Lucene.Net、SQL Server、NHibernate、ASP.NET MVC
我正在使用这些技术:SQL Server 2005、ASP.NET MVC、NHibernate/sharp 架构,并且想挖掘一些文本,最终目的是呈现一些基于 Web 的统计数据。我有数百万个关键字和数百万个文档,并且希望根据这些由关键字索引的文档运行一些查询。我曾尝试过 SQL Server 的全文索引,但并没有留下太深刻的印象。所以我想知道 Lucene.Net 是否可以作为替代方案。
我从未使用过 Lucene.Net,但了解它是 Java 版本的 1:1 移植。所以我的第一个问题是,如果 Lucene 是正确的“技术”,是否值得研究《Lucene in action》这本书?
谢谢。
最好的祝愿,
克里斯蒂安
I am using these technologies: SQL Server 2005, ASP.NET MVC, NHibernate/sharp architecture and would like to mine some text with the final aim of presenting some web based stats . I have several millions of keywords and several millions of documents and would like to run some queries based on these documents indexed by the keywords. I have played a bit with SQL Server’s full text indexing but I am not too impressed. So I am wondering whether Lucene.Net might be an alternative.
I have never used Lucene.Net but understand that it is a 1:1 port of the Java version. So my first question is whether it is worth studying the book ‘Lucene in action’ – provided that Lucene would be the right ‘technology’?
Thanks.
Best wishes,
Christian
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好吧,
首先 - 更新 SQL Server。您使用的是两代过时的版本,该版本在 SQL Server 中首次实现了全文搜索,并具有许多(已知和已修复的)缺点。
其次 - Lucene 可能确实更适合。 SQL主要是一个数据库服务器,全文检索做了很多事情,但也有很多限制。
但是进入 Lucene 确实带来了一个显着的复杂性——分布式事务、备份处理变得更加复杂,因为它们是两个系统。 SQL 2008 R2 在这方面做得更好(全文索引存储在数据库文件中)。
也就是说,也要小心性能。如果您想并行运行大量查询(这在 Web 应用程序中很容易发生),您可能需要一个相当高端的服务器。这可能需要多个数据库服务器运行只读复制 - SQL Server 比 Lucene 更容易做到这一点(如:开箱即用)。
我建议您获取 Lucene 并使用它;)不需要更多。
Well,
FIRST - update SQL Server. You use a two generations outdated version which had the first implementation of full text search in SQL Server and many (known and fixed) shortcomings.
Second - Lucene may really be better suited. SQL is primarily a database server, and the full text search does a lot of things, but also has a lot of limitations.
But entering Lucene DOES provide a significant complication - distributed transactions, backup handling turn a lot more complicated as they are two systems. SQL 2008 R2 does a much better job here (full text index stored in the database file).
That said, also be careful with performance. You may need a QUITE HIGH END SERVER if you want to run a lot of queries in parallel (which can happen easily with a web application). This may require multiple database servers running read only replications - something SQL Server does a lot easier than Lucene (as in: out of the box).
I suggest you just get Lucene and play with it ;) Not a lot more needed.