在线商店的目录搜索架构 - 宝贵资源
我目前正在一家在线商店工作,我很好奇是否有任何“最佳实践”我应该考虑实现亚秒(或接近)搜索操作。我在 Sql Server 2008 中使用全文搜索,我确信我可以通过多种方式对其进行优化。目前,仅 Management Studio 内的搜索大约需要 2-3 秒。此外,我很好奇是否可以利用某种客户端或服务器端缓存。目录数据库包含数百万条记录。有谁知道 Amazon.com 或 Borders.com 如何如此快速地返回搜索结果?有没有讨论搜索优化和架构的书籍或文章?不要将这与搜索引擎优化混淆。现在,我并不关心该网站对公众的可见程度。
I'm currently working on an online store and I'm curious if there are any "best practices" that I should consider to attain subsecond (or close to) search operations. I'm using Full Text Search in Sql Server 2008 which I'm sure I could optimize in various ways. Right now, searches within Management Studio alone are taking 2-3 seconds roughly. Furthermore, I'm curious if client or server-side caching of some sort could be utilized. The database for the catalog contains millions of records. Does anyone know how Amazon.com or Borders.com return search results so quickly? Are there any books or articles that discuss search optimization and architecture? This isn't to be confused with search-engine optimization. Right now, I don't care about how visible the site is to the public.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这些网站使用全文搜索或 IR 库。 Apache Lucene 是一个完全满足您需求的开源框架。这些信息检索或 IR 库使用倒排索引来以索引创建时间为代价获得更好的搜索性能。还要寻找使用 Facets 和 < a href="http://en.wikipedia.org/wiki/Collaborative_filtering" rel="nofollow">协作过滤(您在亚马逊上看到的建议列表)使用 品味。
Those websites use full text search or IR libraries. Apache Lucene is an open source framework that perfectly meets your needs. These information retrieval or IR libraries use inverted-index to obtain better search performance trading the index creation time. Also look for using Facets and collaborative filtering (the suggestion list you see on amazon) using Taste.
www.acm.org/dl
计算机网站
&搜索引擎观察
&微软/企业搜索白皮书
&清醒想象
&自治
& endeca
所有这些资源都发布有用的信息,但并不总是太晦涩难懂。
您可以使用 MSSQL 2008 完成该任务,但您需要花费比 stackO 上的问题更多的时间。 |恕我直言|
注意:在架构之前探索实现问题很好,但将这些实现细节引入架构并不总是一个好主意。
www.acm.org/dl
&computer.org
& searchenginewatch
& microsoft/enterprisesearch whitepapers
& lucidimagination
& autonomy
& endeca
All of these resources publish consumable information that is both useful and not always too obscure nor facile.
You can get the task done with MSSQL 2008 but you need to spend more time than a question on stackO can get you. |imho|
Note: Its fine to explore the implementation issues before you architect, but its not always a good idea to bring those implementation details into the architecture.