如何创建像 Indeed.com 这样的搜索?

发布于 2024-11-14 08:20:19 字数 954 浏览 2 评论 0原文

如果您以前使用过 Indeed.com,您可能知道,对于您查找的关键字,只要屏幕左侧有多个搜索细化选项,它就会返回传统的搜索结果。

例如,搜索关键字“设计师”,细化选项是:

Salary Estimate
    $40,000+ (45982)
    $60,000+ (29795)
    $80,000+ (15966)
    $100,000+ (6896)
    $120,000+ (2828)
Title
    Floral Design Specialist (945)
    Hair Stylist (817)
    GRAPHIC DESIGNER (630)
    Hourly Associates/Co-managers (589)
    Web designer (584)
    more »
Company
    Kelly Services (1862)
    Unlisted Company (1133)
    CyberCoders Engineering (1058)
    Michaels Arts & Crafts (947)
    ULTA (818)
    Elance (767)
Location
    New York, NY (2960)
    San Francisco, CA (1633)
    Chicago, IL (1184)
    Houston, TX (1057)
    Seattle, WA (1025)
    more »
Job Type
    Full-time (45687)
    Part-time (2196)
    Contract (8204)
    Internship (720)
    Temporary (1093)

它如何如此快速地收集统计信息(例如每个薪资范围内的工作机会数量)。看起来优化选项是实时创建的,因为次要关键字加载也很快。

是否有特定的 SQL 技术来创建此类功能?或者网上有一本手册解释了这背后的技术吗?

If you have used indeed.com before, you may know that for the keywords you look for, it returns a traditional search results as long as multiple search refinement options on the left side of screen.

For example, searching for keyword "designer", the refinement options are:

Salary Estimate
    $40,000+ (45982)
    $60,000+ (29795)
    $80,000+ (15966)
    $100,000+ (6896)
    $120,000+ (2828)
Title
    Floral Design Specialist (945)
    Hair Stylist (817)
    GRAPHIC DESIGNER (630)
    Hourly Associates/Co-managers (589)
    Web designer (584)
    more »
Company
    Kelly Services (1862)
    Unlisted Company (1133)
    CyberCoders Engineering (1058)
    Michaels Arts & Crafts (947)
    ULTA (818)
    Elance (767)
Location
    New York, NY (2960)
    San Francisco, CA (1633)
    Chicago, IL (1184)
    Houston, TX (1057)
    Seattle, WA (1025)
    more »
Job Type
    Full-time (45687)
    Part-time (2196)
    Contract (8204)
    Internship (720)
    Temporary (1093)

How does it gather statistics information so quickly (e.g. the number of job offers in each salary range). Looks like the refinement options are created in realtime since minor keywords load fast too.

Is there a specific SQL technique to create such feature? Or is there a manual on the web explaining the tech behind this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

盗琴音 2024-11-21 08:20:19

Indeed.com 和其他搜索引擎中使用的技术称为反向索引,它是其核心搜索引擎如何工作(例如 Google)。您所指的过滤(“细化选项”)称为构面。

您可以使用 Apache Solr,它是一个使用 Lucene 构建的成熟搜索服务器,可以轻松集成到您的应用程序中它的 RESTful API。开箱即用,具有多种功能,如分面、缓存、缩放、拼写检查等。也被 Netflix、C-Net、AOL 等多个网站使用 - 因此稳定、可扩展且战斗 -已测试。

如果您想深入研究基于构面的过滤工作,请查找 Bitsets/Bitarrays,并在 文章

The technology used in Indeed.com and other search engines is known as inverted indexing which is at the core of how search engines work (e.g Google). The filtering you refer to ("refinement options") are known as facets.

You can use Apache Solr, a full-fledged search server built using Lucene and easily integrable into your application using its RESTful API. Comes out-of-the-box with several features such as faceting, caching, scaling, spell-checking, etc. Is also used by several sites such as Netflix, C-Net, AOL etc. - hence stable, scalable and battle-tested.

If you want to dig deep into facet based filtering works, lookup Bitsets/Bitarrays and is described in this article.

披肩女神 2024-11-21 08:20:19

为什么您认为它们加载“太快”?他们当然有很好的、可扩展的架构,他们肯定使用缓存,他们可能使用一些非规范化的数据存储来加速一些计算和查询。

看看 google 和全球网页数量 - 您是否也认为 google 运行得太快?

Why do you think that they load "too fast"? They certainly have nice, scaled architecture, they use caching for sure, they might be using some denormalized datastore to accelerate some computations and queries.

Take a look at google and number of web pages worldwide - you also think that google works too fast?

遥远的她 2024-11-21 08:20:19

除了 Mios 所说的和 Daimon 提到的之外,它确实使用了非规范化的文档存储。以下是 Indeed 关于其文档库的技术演讲的链接

http://engineering.indeed.com/blog/2013/03/indeedeng-from-1-to-1-billion-video/

另一篇相关文章在他们的工程博客上:
http: //engineering.indeed.com/blog/2013/10/serving-over-1-billion-documents-per-day-with-docstore-v2/

In addition to what Mios said and as Daimon mentioned it does use a denormalized doc store. Here is a link to Indeed's tech talk about its docstore

http://engineering.indeed.com/blog/2013/03/indeedeng-from-1-to-1-billion-video/

Also another related article on their Engineering blog:
http://engineering.indeed.com/blog/2013/10/serving-over-1-billion-documents-per-day-with-docstore-v2/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文