计算大型数据集中的不同值(40M 行): SELECT count(*) as count, name FROM names GROUP BY name ORDER BY name;
CREATE TABLE `names` ( `name` varchar(20) );
假设 names
表包含居住在加利福尼亚州的每个人的全部 4000 万个名字(例如)。
SELECT count(*) as count, name FROM names GROUP BY name ORDER BY name;
我该如何优化这个查询?
预期结果:
count | name
9999 | joe
9995 | mike
9990 | kate
.... | ....
2 | kal-el
CREATE TABLE `names` ( `name` varchar(20) );
Assume the names
table contains all 40 million first names of everyone living in California (for example).
SELECT count(*) as count, name FROM names GROUP BY name ORDER BY name;
How can I optimize this query?
Expected Result:
count | name
9999 | joe
9995 | mike
9990 | kate
.... | ....
2 | kal-el
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您必须在表的名称列上创建索引。查询尽可能好。
You have to create an index on the name column of your table. The query is as good as it can be.
那么,是什么让您认为它尚未优化?这看起来像是一个好的数据库引擎应该能够相对轻松地处理的查询类型 - 特别是如果您的表上有适当的索引。
您是否确实遇到了瓶颈,或者您是否担心将来可能发生的事情?如果是后者,我建议您尝试使用 RDBMS(通过生成虚拟数据),看看会发生什么。
Well, what makes you think it's not already optimised? This looks like the sort of query a good database engine should be able to handle relatively easily - particularly if you've got an appropriate index on your table.
Do you actually have a bottleneck here, or are you worrying about something that might happen in the future? If it's the latter, I suggest you try it with your RDBMS (by generating dummy data), and see what happens.