如何创建像 Indeed.com 这样的搜索?
如果您以前使用过 Indeed.com,您可能知道,对于您查找的关键字,只要屏幕左侧有多个搜索细化选项,它就会返回传统的搜索结果。
例如,搜索关键字“设计师”,细化选项是:
Salary Estimate
$40,000+ (45982)
$60,000+ (29795)
$80,000+ (15966)
$100,000+ (6896)
$120,000+ (2828)
Title
Floral Design Specialist (945)
Hair Stylist (817)
GRAPHIC DESIGNER (630)
Hourly Associates/Co-managers (589)
Web designer (584)
more »
Company
Kelly Services (1862)
Unlisted Company (1133)
CyberCoders Engineering (1058)
Michaels Arts & Crafts (947)
ULTA (818)
Elance (767)
Location
New York, NY (2960)
San Francisco, CA (1633)
Chicago, IL (1184)
Houston, TX (1057)
Seattle, WA (1025)
more »
Job Type
Full-time (45687)
Part-time (2196)
Contract (8204)
Internship (720)
Temporary (1093)
它如何如此快速地收集统计信息(例如每个薪资范围内的工作机会数量)。看起来优化选项是实时创建的,因为次要关键字加载也很快。
是否有特定的 SQL 技术来创建此类功能?或者网上有一本手册解释了这背后的技术吗?
If you have used indeed.com before, you may know that for the keywords you look for, it returns a traditional search results as long as multiple search refinement options on the left side of screen.
For example, searching for keyword "designer", the refinement options are:
Salary Estimate
$40,000+ (45982)
$60,000+ (29795)
$80,000+ (15966)
$100,000+ (6896)
$120,000+ (2828)
Title
Floral Design Specialist (945)
Hair Stylist (817)
GRAPHIC DESIGNER (630)
Hourly Associates/Co-managers (589)
Web designer (584)
more »
Company
Kelly Services (1862)
Unlisted Company (1133)
CyberCoders Engineering (1058)
Michaels Arts & Crafts (947)
ULTA (818)
Elance (767)
Location
New York, NY (2960)
San Francisco, CA (1633)
Chicago, IL (1184)
Houston, TX (1057)
Seattle, WA (1025)
more »
Job Type
Full-time (45687)
Part-time (2196)
Contract (8204)
Internship (720)
Temporary (1093)
How does it gather statistics information so quickly (e.g. the number of job offers in each salary range). Looks like the refinement options are created in realtime since minor keywords load fast too.
Is there a specific SQL technique to create such feature? Or is there a manual on the web explaining the tech behind this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Indeed.com 和其他搜索引擎中使用的技术称为反向索引,它是其核心搜索引擎如何工作(例如 Google)。您所指的过滤(“细化选项”)称为构面。
您可以使用 Apache Solr,它是一个使用 Lucene 构建的成熟搜索服务器,可以轻松集成到您的应用程序中它的 RESTful API。开箱即用,具有多种功能,如分面、缓存、缩放、拼写检查等。也被 Netflix、C-Net、AOL 等多个网站使用 - 因此稳定、可扩展且战斗 -已测试。
如果您想深入研究基于构面的过滤工作,请查找 Bitsets/Bitarrays,并在 文章。
The technology used in Indeed.com and other search engines is known as inverted indexing which is at the core of how search engines work (e.g Google). The filtering you refer to ("refinement options") are known as facets.
You can use Apache Solr, a full-fledged search server built using Lucene and easily integrable into your application using its RESTful API. Comes out-of-the-box with several features such as faceting, caching, scaling, spell-checking, etc. Is also used by several sites such as Netflix, C-Net, AOL etc. - hence stable, scalable and battle-tested.
If you want to dig deep into facet based filtering works, lookup Bitsets/Bitarrays and is described in this article.
为什么您认为它们加载“太快”?他们当然有很好的、可扩展的架构,他们肯定使用缓存,他们可能使用一些非规范化的数据存储来加速一些计算和查询。
看看 google 和全球网页数量 - 您是否也认为 google 运行得太快?
Why do you think that they load "too fast"? They certainly have nice, scaled architecture, they use caching for sure, they might be using some denormalized datastore to accelerate some computations and queries.
Take a look at google and number of web pages worldwide - you also think that google works too fast?
除了 Mios 所说的和 Daimon 提到的之外,它确实使用了非规范化的文档存储。以下是 Indeed 关于其文档库的技术演讲的链接
http://engineering.indeed.com/blog/2013/03/indeedeng-from-1-to-1-billion-video/
另一篇相关文章在他们的工程博客上:
http: //engineering.indeed.com/blog/2013/10/serving-over-1-billion-documents-per-day-with-docstore-v2/
In addition to what Mios said and as Daimon mentioned it does use a denormalized doc store. Here is a link to Indeed's tech talk about its docstore
http://engineering.indeed.com/blog/2013/03/indeedeng-from-1-to-1-billion-video/
Also another related article on their Engineering blog:
http://engineering.indeed.com/blog/2013/10/serving-over-1-billion-documents-per-day-with-docstore-v2/