运行大量“计数 QL”在 noSQL“GAE”上数据存储
在开始之前,让我向您提供一些有关我们环境的信息:
- 它完全是用 Java/J2EE 编写的。
- 它被开发为部署在GAE“Google App Engine”上,
- 其GUI由GWT开发。
- 我们的问题是核心发展问题。
这是我的问题,
- 我正在构建一个网络应用程序,“在线”用户可以在该应用程序中搜索该网站中的列表。
- 首先,请打开网站 Careerbuilder.com 并搜索任何关键字,例如“会计”。
- 将打开一个页面,[窄搜索]有一种方法可以让您更轻松地找到目标工作“让我们称之为过滤器”,那里有很多工作。
- 搜索过滤器包括子过滤器 [类别、公司、城市、州]。
- 每个子过滤器都有很多情况或选项。就像“州有(加利福尼亚州、爱荷华州、堪萨斯州……等)”一样,每个州旁边都是与您当前的过滤器/子过滤器选择相匹配的职位数量。你会在括号中找到它,即 (23)
现在我们希望允许此过滤器功能,并且我们希望使其更快。 为每个子过滤器选项进行计数查询将是一个有效的想法。
请记住:
- 用户可以添加/删除列表。
- 列表也可能会过期。
- 子过滤器的数量对于我们来说更高“可以达到20个”。
- 每个子过滤器有 2 到 200 个选项。
我们正在寻找最佳实践或算法建议或任何解决此问题的方法。
到目前为止,我们已经达到了 2 个选项:
1.构建一个统计表来保存这些结果,然后在每次列表数量更改时更新它,并保留每晚的后台作业来重新计算结果。我们可以直接从这个表中显示结果的数量。
2.构建一个树形数据结构,每次更新时都会加载到内存中并保存在表中。该树包含子过滤器每个选项中的列表结果数量。
尽管我仍然认为这还不够! 谁能提出更好的主意? 非常欢迎所有意见、问题、建议。
问候
穆罕默德·S.
before we start let me give you some information about our environment:
- it is written fully in Java/J2EE.
- it is developed to be deployed on GAE "Google App Engine"
- its GUI is developed by GWT.
- our problem is in a core development issue.
Here is my problem,
- i am building a web application where users "online" can search for listings in this website.
- first please open the web site careerbuilder.com and search for any keyword e.g. "Accounting".
- a page will be opened , [Narrow Search] has a way to allow you go to your target job easier "lets call this a filter" ,lots of jobs down there.
- search filter includes sub-filters [Category , Company , City , State ].
- each sub-filter has many cases or options. like for "State has (California ,Iowa , Kansas , ...etc)" beside each one of them is the number of jobs that matches your current filter/sub-filter selection. you will find it between brackets i.e. (23)
Now we want to allow this filter functionality and we want to make it fast.
making a count query for each sub-filter option is going to be an effective idea.
kindly keep in mind that:
- users can add/remove listing.
- also listings can expire.
- number of sub-filters are higher for us "can reach 20".
- each sub-filter has between 2 and 200 options.
we are searching for the best practice or a suggestion of an algorithm or whatever to solve this problem.
here are 2 options we have reached so far:
1.building a statistics table to save these results in it, then update it each time listings number is changed , also keep a nightly background job to recalculate results. and we can show number of results directly from this table.
2.build a tree data structure to be loaded on memory and saved in a table each time it is updated. this tree contains the resulting numbers of listings in each option of sub-filters.
even though i still think this is not enough !!!
can anyone suggest a better idea?
all comments, questions, suggestions are very welcomed.
Regards
Mohammad S.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您是否注意到 Google 应用程序很少对任何内容给出准确的计数?特别是在使用过滤器时?您总是会得到这些猜测,例如“超过 1000 个”或“数万个”,或“显示大约 23123123 个结果中的 20 个”。好吧,现在你明白为什么了。欢迎来到 noSQL 的世界
(尽管坦率地说,使用过滤器进行计数在 SQL 领域也很糟糕)。
这不是一个解决方案,而是一种解决方法,但它很常见:
这可能非常有效,并且用户似乎并不介意(也没有注意到)。
Have you noticed how Google applications rarely give exact counts on anything? Especially, when using filters? You always get these guesstimates, like 'more than 1000' or 'tens of thousands', or 'showing 20 of about 23123123 results'. Well, now you see why. Welcome to the world of noSQL
(although, frankly, counts with filters are bad in the sql land as well).
It's not a solution, but a workaround, but it's common:
This can be pretty effective and users do not seem to mind (nor notice).