因此,如果您想要进行全文搜索:如果您有多个数据,请使用 lucene。 顺便说一句:我正在使用混合结构:将数据保存在 mysql 和 lucene 中只是一个索引,(几乎)没有存储数据(以保持该索引小而快)。
I had the same decision in November 2010. I'm a friend of mysql and tried to build an search application on mysql first - which works well... ...and fast (i thought it was fast): searching in 200.000 documents (in not more than 2-3 second)
I avoided spending time to lucene/solr, because i would like to use this time for developing the application. And: lucene was new for me... I don't know, if it is good enough, i don't know what it is.... Finally: You can't change the habits of a lifetime.
However, i run in different problems with fuzzy search (which is difficult to implement in mysql) or "more like this" (which have to be coded from scrat in an application using mysql or simple use that "more like this" solr-feature out of the box). Finally the number of documents rises up to a million and mysql needs now more than 15 seconds to search into the documents.
So i decided to start with lucene and it feels like i opened a door to a new world. Lot's of features (i hardly coded application-features) are now provided from solr and working out of the box. The fulltext searches are much, much faster: less than 50ms in 1 million Documents, less than 1ms, if it is cached.
So the invested time has paid off.
So if you think about to make an fulltext search: take lucene, if you have mor than a couple of data. By the way: i'm using an hybrid construct: holding the data in mysql and lucene is only an index with (nearly) no stored data (to keep that index small and fast).
generically speaking, if you are going to have full text searches, you will most surely need lucene or sphinx + mysql (or lucene + mysql, storing the indexable fields in lucene, and returning an id for a mysql row). either of them are excellent choices.
if you are going to do "normal" searches (i.e: integer or char columns or date), mysql partitoning will suffice.
you need to specify what are you going to search for. and how often you will be reindexing your db (if you are going to reindex a lot, i'd go with sphinx)
You are asking whether to go with Lucene or MySQL. But Lucene is a library, and MySQL is a server. You should really be deciding between SOLR search engine and MySQL. In that case, the right answer is likely to be both. Manage all the data in MySQL. Run processes to regularly extract changed data, transform it into SOLR search format, and load it into the search engine. Using SOLR is much more straightforward than using Lucene directly, and if you need to modify the behavior in some way, you can still write plugins for SOLR so there is no loss of flexibility.
But it would be the kiss of death to try and manage data with SOLR. The cycle of read-edit-update works great with SQL dbs but it is not what SOLR is all about. SOLR is fast flexible text search. You can stick image URLs in SOLR for convenience of preparing search results using a non-indexed field.
发布评论
评论(3)
我在 2010 年 11 月也做出了同样的决定。我是 mysql 的朋友,并尝试首先在 mysql 上构建一个搜索应用程序 - 效果很好......
...而且快速(我认为它很快):搜索 200.000 个文档(不超过 2-3 秒)
我避免花时间在 lucene/solr 上,因为我想利用这段时间来开发应用程序。而且:lucene 对我来说是新的......我不知道,如果它足够好,我不知道它是什么......
最后:你无法改变一生的习惯。
然而,我遇到了模糊搜索(这在 mysql 中很难实现)或“更像这样”(必须使用 mysql 在应用程序中从 scrat 进行编码或简单使用“更像这样”solr-feature)遇到不同的问题开箱即用)。
最后,文档数量增加到一百万,mysql 现在需要超过 15 秒来搜索文档。
所以我决定从lucene开始,感觉就像我打开了一扇新世界的大门。
现在 solr 提供了很多功能(我几乎没有编码应用程序功能)并且开箱即用。全文搜索要快得多:100 万个文档的搜索时间不到 50 毫秒,如果缓存的话则不到 1 毫秒。
所以投入的时间得到了回报。
因此,如果您想要进行全文搜索:如果您有多个数据,请使用 lucene。
顺便说一句:我正在使用混合结构:将数据保存在 mysql 和 lucene 中只是一个索引,(几乎)没有存储数据(以保持该索引小而快)。
I had the same decision in November 2010. I'm a friend of mysql and tried to build an search application on mysql first - which works well...
...and fast (i thought it was fast): searching in 200.000 documents (in not more than 2-3 second)
I avoided spending time to lucene/solr, because i would like to use this time for developing the application. And: lucene was new for me... I don't know, if it is good enough, i don't know what it is....
Finally: You can't change the habits of a lifetime.
However, i run in different problems with fuzzy search (which is difficult to implement in mysql) or "more like this" (which have to be coded from scrat in an application using mysql or simple use that "more like this" solr-feature out of the box).
Finally the number of documents rises up to a million and mysql needs now more than 15 seconds to search into the documents.
So i decided to start with lucene and it feels like i opened a door to a new world.
Lot's of features (i hardly coded application-features) are now provided from solr and working out of the box. The fulltext searches are much, much faster: less than 50ms in 1 million Documents, less than 1ms, if it is cached.
So the invested time has paid off.
So if you think about to make an fulltext search: take lucene, if you have mor than a couple of data.
By the way: i'm using an hybrid construct: holding the data in mysql and lucene is only an index with (nearly) no stored data (to keep that index small and fast).
一般来说,如果你要进行全文搜索,你肯定需要 lucene 或 sphinx + mysql (或 lucene + mysql,将可索引字段存储在 lucene 中,并返回 mysql 行的 id)。它们都是很好的选择。
如果您打算进行“普通”搜索(即:整数或字符列或日期),mysql 分区就足够了。
您需要指定要搜索的内容。以及您将多久重新索引一次数据库(如果您要经常重新索引,我会选择 sphinx)
generically speaking, if you are going to have full text searches, you will most surely need lucene or sphinx + mysql (or lucene + mysql, storing the indexable fields in lucene, and returning an id for a mysql row). either of them are excellent choices.
if you are going to do "normal" searches (i.e: integer or char columns or date), mysql partitoning will suffice.
you need to specify what are you going to search for. and how often you will be reindexing your db (if you are going to reindex a lot, i'd go with sphinx)
你问的是使用 Lucene 还是 MySQL。但Lucene是一个库,而MySQL是一个服务器。您确实应该在 SOLR 搜索引擎和 MySQL 之间做出选择。在这种情况下,正确的答案很可能是两者兼而有之。管理MySQL中的所有数据。运行流程定期提取更改的数据,将其转换为 SOLR 搜索格式,并将其加载到搜索引擎中。使用 SOLR 比直接使用 Lucene 简单得多,如果您需要以某种方式修改行为,您仍然可以为 SOLR 编写插件,因此不会损失灵活性。
但尝试使用 SOLR 管理数据将是死亡之吻。读取-编辑-更新的循环非常适合 SQL 数据库,但这并不是 SOLR 的全部内容。 SOLR 是快速灵活的文本搜索。您可以将图像 URL 粘贴到 SOLR 中,以便使用非索引字段准备搜索结果。
You are asking whether to go with Lucene or MySQL. But Lucene is a library, and MySQL is a server. You should really be deciding between SOLR search engine and MySQL. In that case, the right answer is likely to be both. Manage all the data in MySQL. Run processes to regularly extract changed data, transform it into SOLR search format, and load it into the search engine. Using SOLR is much more straightforward than using Lucene directly, and if you need to modify the behavior in some way, you can still write plugins for SOLR so there is no loss of flexibility.
But it would be the kiss of death to try and manage data with SOLR. The cycle of read-edit-update works great with SQL dbs but it is not what SOLR is all about. SOLR is fast flexible text search. You can stick image URLs in SOLR for convenience of preparing search results using a non-indexed field.