获取直方图数据
有没有办法在 MySQL 中指定 bin 大小?现在,我正在尝试以下 SQL 查询:
select total, count(total) from faults GROUP BY total;
正在生成的数据足够好,但行数太多。我需要的是一种将数据分组到预定义的容器中的方法。我可以通过脚本语言来完成此操作,但是有没有办法直接在 SQL 中完成此操作?
示例:
+-------+--------------+
| total | count(total) |
+-------+--------------+
| 30 | 1 |
| 31 | 2 |
| 33 | 1 |
| 34 | 3 |
| 35 | 2 |
| 36 | 6 |
| 37 | 3 |
| 38 | 2 |
| 41 | 1 |
| 42 | 5 |
| 43 | 1 |
| 44 | 7 |
| 45 | 4 |
| 46 | 3 |
| 47 | 2 |
| 49 | 3 |
| 50 | 2 |
| 51 | 3 |
| 52 | 4 |
| 53 | 2 |
| 54 | 1 |
| 55 | 3 |
| 56 | 4 |
| 57 | 4 |
| 58 | 2 |
| 59 | 2 |
| 60 | 4 |
| 61 | 1 |
| 63 | 2 |
| 64 | 5 |
| 65 | 2 |
| 66 | 3 |
| 67 | 5 |
| 68 | 5 |
------------------------
我正在寻找什么:
+------------+---------------+
| total | count(total) |
+------------+---------------+
| 30 - 40 | 23 |
| 40 - 50 | 15 |
| 50 - 60 | 51 |
| 60 - 70 | 45 |
------------------------------
我想这不能以直接的方式实现,但对任何相关存储过程的引用也可以。
Is there a way to specify bin sizes in MySQL? Right now, I am trying the following SQL query:
select total, count(total) from faults GROUP BY total;
The data that is being generated is good enough but there are just too many rows. What I need is a way to group the data into predefined bins. I can do this from a scripting language, but is there a way to do it directly in SQL?
Example:
+-------+--------------+
| total | count(total) |
+-------+--------------+
| 30 | 1 |
| 31 | 2 |
| 33 | 1 |
| 34 | 3 |
| 35 | 2 |
| 36 | 6 |
| 37 | 3 |
| 38 | 2 |
| 41 | 1 |
| 42 | 5 |
| 43 | 1 |
| 44 | 7 |
| 45 | 4 |
| 46 | 3 |
| 47 | 2 |
| 49 | 3 |
| 50 | 2 |
| 51 | 3 |
| 52 | 4 |
| 53 | 2 |
| 54 | 1 |
| 55 | 3 |
| 56 | 4 |
| 57 | 4 |
| 58 | 2 |
| 59 | 2 |
| 60 | 4 |
| 61 | 1 |
| 63 | 2 |
| 64 | 5 |
| 65 | 2 |
| 66 | 3 |
| 67 | 5 |
| 68 | 5 |
------------------------
What I am looking for:
+------------+---------------+
| total | count(total) |
+------------+---------------+
| 30 - 40 | 23 |
| 40 - 50 | 15 |
| 50 - 60 | 51 |
| 60 - 70 | 45 |
------------------------------
I guess this cannot be achieved in a straight forward manner but a reference to any related stored procedure would be fine as well.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
我在这里找到它 http://blog.shlomoid.com/2011/ 08/如何快速创建直方图-in.html
I found it here http://blog.shlomoid.com/2011/08/how-to-quickly-create-histogram-in.html
Mike DelGaudio 的回答是我的做法,但略有改变:
优势?您可以根据需要将垃圾箱设置得尽可能大或尽可能小。大小为 100 的垃圾箱?
下限(mycol/100)*100
。尺寸为 5 的垃圾箱?地板(mycol/5)*5
。贝尔纳多.
Mike DelGaudio's answer is the way I do it, but with a slight change:
The advantage? You can make the bins as large or as small as you want. Bins of size 100?
floor(mycol/100)*100
. Bins of size 5?floor(mycol/5)*5
.Bernardo.
表 bin 包含定义 bin 的 min_value 和 max_value 列。
请注意,运算符“join... on x BETWEEN y and z”是包含在内的。
table1是数据表的名称
The table bins contains columns min_value and max_value which define the bins.
note that the operator "join... on x BETWEEN y and z" is inclusive.
table1 is the name of the data table
奥弗里·拉维夫的答案非常接近但不正确。即使直方图间隔中的结果为零,
count(*)
也将为1
。需要修改查询以使用条件sum
:Ofri Raviv's answer is very close but incorrect. The
count(*)
will be1
even if there are zero results in a histogram interval. The query needs to be modified to use a conditionalsum
:只要间隔不是太多,这就是一个很好的解决方案。
As long as there are not too many intervals, this is a pretty good solution.
我制作了一个程序,可用于根据指定的数量或大小自动生成 bin 的临时表,以便稍后与 Ofri Raviv 的解决方案一起使用。
这将仅生成填充的容器的直方图计数。 David West 的纠正应该是正确的,但由于某种原因,未填充的垃圾箱不会出现在我的结果中(尽管使用了 LEFT JOIN - 我不明白为什么)。
I made a procedure that can be used to automatically generate a temporary table for bins according to a specified number or size, for later use with Ofri Raviv's solution.
This will generate the histogram count only for the bins that are populated. David West ought to be right in his correction, but for some reason, unpopulated bins do not appear in the result for me (despite the use of a LEFT JOIN — I do not understand why).
那应该有效。不是那么优雅,但仍然:
通过 Mike DelGaudio
That should work. Not so elegant but still:
via Mike DelGaudio
等宽分箱到给定数量的箱中:
请注意,0.0000001 的作用是确保值等于 max(col) 的记录不会单独创建自己的箱。此外,附加常量可确保当列中的所有值都相同时查询不会因除零而失败。
另请注意,bin 的计数(示例中为 10)应使用小数标记,以避免整数除法(未调整的 bin_width 可以是小数)。
Equal width binning into a given count of bins:
Note that the 0.0000001 is there to make sure that the records with the value equal to max(col) do not make it's own bin just by itself. Also, the additive constant is there to make sure the query does not fail on division by zero when all the values in the column are identical.
Also note that the count of bins (10 in the example) should be written with a decimal mark to avoid integer division (the unadjusted bin_width can be decimal).
除了很好的答案 https://stackoverflow.com/a/10363145/916682 之外,您还可以使用 phpmyadmin 图表工具一个不错的结果:
In addition to great answer https://stackoverflow.com/a/10363145/916682, you can use phpmyadmin chart tool for a nice result: