如何像newegg一样获取每个产品属性/过滤器的总数

发布于 2024-10-03 16:15:40 字数 122 浏览 4 评论 0原文

如果您访问 newegg.com(仅一个示例),您会在浏览产品时注意到,您可以在左侧边栏中看到每个产品属性旁边的商品数量。

由于某些商品的属性如此之多,产品过滤器的配置如此之多,他们如何如此快速地计算所有这些总数?

If you go to newegg.com (just one example) you'll notice while browsing products you can see the number of items next to each product attribute in the left hand sidebar.

With so many attributes on some items and so many different configurations of product filters how do they calculate all of those totals so fast?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

何止钟意 2024-10-10 16:15:40

对于 newegg.com,他们使用 endeca

简而言之,endeca 实际上会使用 xml/csv 中提供的数据或直接从任何数据库(不仅限于 mysql)检索数据并计算相似度并将结果分组为自己的格式

Endeca 不是免费的,开源替代方案,例如 sphinx lucene solr

For newegg.com, they are using a faceted navigation technology provided by endeca

In nutshell, endeca will actually use the data provided in xml/csv or directly retrieve data from any database (not limited to just mysql) and calculate similarity and group the result into their own format

Endeca is not free, the open-source alternative such as sphinx or lucene solr

不知在何时 2024-10-10 16:15:40

Newegg 使用 Endeca,他们可能是 Endeca 的早期客户之一。回想起来,Endeca 可能是他们成功的重要贡献者。分面导航在计算机部件等复杂电子设备上效果很好。

在分面导航中需要考虑以下几点:

1) 您是否只想在类别驱动的查询上使用分面导航,还是也希望它适用于搜索?事实上,类别是某种层次结构的方面。

2)Solr的非规范化倒排索引模型会给您带来问题吗?

如果 1) 的答案为真(很可能是),您将需要一些倒排索引。倒排索引几乎是进行关键字搜索的唯一方法。他们还将进行刻面处理,但有一些注意事项。

本质上,您可以将每个方面视为倒排索引(事实上,关键字搜索可能被视为具有排名功能的特殊方面)。然后,要进行计数,您必须将当前查询和过滤器与所有其他方面值相交。但是,如果您需要表示稀疏乘积集,则此模型可能会导致问题(请参见图 2)。

如果 2) 的答案是正确的,那么从 OLAP 角度更多地考虑方面可能会有所帮助。我不知道倒排索引是否可以在没有一些抽象的情况下处理复杂的关系。

将分面搜索/导航作为全文(通常作为倒排索引实现)和/或 OLAP 的混合来考虑和实现是公平的。

我很确定您可以使用列存储来实现分面,但是如果您想要关键字搜索,您仍然需要有一个可供使用的倒排索引来合并。

@Dan Grossman:

看起来可能是这样,但是 -

你有没有想过有多少种面的组合?你不能像这样缓存这么多页面。 Newegg.com 上的组合可能比天空中的星星还多。

再加上多重选择,情况就更糟了。游戏结束。

您只能缓存某些情况,例如未过滤的和通常过滤的情况。如果您尝试在不限制递归级别的情况下对 Newegg.com 进行蜘蛛抓取,您就会杀死蜘蛛。正是由于这个原因,多面网站通常会给搜索引擎带来问题。请参阅 http://www.searchmarketingstandard.com/facets-navigational-seo-powerhouse-部分

Newegg uses Endeca, and they were probably one of Endeca's earlier customers. In retrospect, Endeca might have been a big contributor to their success. Faceted navigation works very well on complex electronics like computer parts.

There are a few things to consider in faceted navigation:

1) Do you want just faceted navigation on category-driven queries, or do you also want it to work on search? In fact, categories are a hierarchical facet of sorts.

2) Does the de-normalized inverted index model of Solr cause you problems?

If the answer to 1) is true -- it probably is -- you'll need some inverted indices. Inverted indices are pretty much the only way to do keyword search. They will also do faceting with some caveats.

Essentially you can consider each facet as an inverted index (in fact keyword search might be considered a special facet with ranking functions). Then to do counts you'd have to intersect/and the current query and filters with all other facet values. However, this model can lead to problems if you need to represent sparse product sets (see 2).

If the answer to 2) is true, it might help more to think about facets more in terms of OLAP. I don't know if inverted indices can handle complex relationships without some abstractions.

It's fair to consider and implement faceted search/nav as a blend of fulltext (typically implemented as an inverted index) and/or OLAP.

I'm pretty sure you can pull off faceting with a column store, but you'd still need to have an inverted index at your disposal to merge with if you want keyword search.

@Dan Grossman:

It might seem so, BUT --

Did you think for a moment how many combinations there are of facets? You can't cache so many pages like that. There are probably more combinations on Newegg.com than stars in your sky.

Add in multiple selection and it's even worse. Game over.

You can only cache some cases like unfiltered and commonly filtered. If you try to spider Newegg.com without limiting levels of recursion, you'll kill the spider. Faceted sites cause problems for search engines in general for this very reason. See http://www.searchmarketingstandard.com/facets-navigational-seo-powerhouse-part

微凉徒眸意 2024-10-10 16:15:40

你不知道他们计算速度很快。你只知道它们渲染速度很快。他们可能会花费数小时计算这些总数并呈现页面,缓存结果并提供这些静态文件,直到他们想要刷新数据为止。

You do not know that they calculate them fast. You only know that they render them fast. They could spend hours calculating those totals and rendering their pages, cache the results and serve those static files until some time when they want to refresh the data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文