如何标准化雷达图的统计数据
我正在使用 raphaelJS 绘制“雷达图”来显示统计数据。对于每个轴,它应该接受 0 到 10 之间的值。
例如,中心点位于图表中心的多边形的值 [10,10,10,10,10]。简单...
但是,数据可能看起来像这样:
[26, 14, 48, 18, 1],
[ 3, 14, 8, 9, 5],
[10, 6, 4, 16, 3]
这会导致这种情况(显示其中心点位于图表左下角的多边形):
如果我根据数据的最大值(在本例中为 48)对数据进行标准化,则所有其他中心点都将太靠近图表的中心,并且其信息丰富值将约为 0。
相同的数据根据其最大值进行标准化:
[5.42, 2.92, 10, 3.75, 0.21],
[0.63, 2.92, 1.67, 1.88, 1.04],
[2.08, 1.25, 0.83, 3.34, 0.63]
所以现在所有其他中心点都是聚集在图表的中心,并且失去了所有的解释力...如果中心点超过 3 个,它们很可能会相互重叠。
我正在考虑一种显示每个多边形的相对方式,如果可能的话,不会丢失每个多边形之间的太多关系。 ..
有什么想法可以做到这一点,或者可能是另一种标准化方法?
I'm using raphaelJS to draw a "radar chart" to display statistical data. For each axis it should accept values between 0 and 10.
For example, the vales of a polygon with its center point right in the center of the chart [10,10,10,10,10]. Simple...
However, it might happen that data looks like this:
[26, 14, 48, 18, 1],
[ 3, 14, 8, 9, 5],
[10, 6, 4, 16, 3]
which leads to this (displaying the polygon with its center point in the bottom left off the chart):
If I would normalize data based on its biggest value (in this case 48), all of the other center points would be too near to the center the chart, and and its informative value would be around 0.
same data normalized based on its biggest value:
[5.42, 2.92, 10, 3.75, 0.21],
[0.63, 2.92, 1.67, 1.88, 1.04],
[2.08, 1.25, 0.83, 3.34, 0.63]
So now all of the other center points are clustered in the center of the chart, and have lost all of their explanatory power... If there was more than 3 center points, they would most likely overlap each other.
I was thinking about a relative way to display each polygon, without losing too much relation between each polygon, if it's possible...
Any ideas how to do this, or maybe another approach how to normalize?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
正如@daroczig建议的,log-数据转换是必由之路。我只是想补充一点,您可以执行多种类型的转换。
也许一个例子可能会有所帮助。我将使用 平行坐标 可视化来说明示例,但相同的概念也适用于雷达图。所有实验均在 MATLAB 中进行。
考虑 Fisher Iris 数据集,它包含 150 个实例,其中每个点有 4 个维度。如果我们在正常值范围之外添加一个离群点,我们会得到:
正如预期的那样,该图进行缩放以适应新点,但结果是我们失去了之前的详细视图。
答案是通过应用某种变换来标准化数据。下面显示了四种不同转换的比较:
最小/最大标准化:
x_new = (x-min)/(max-min)
,因此x_new 在 [0,1]
z -标准化:
x_new = (x-mean)/std
,其中x_new ~ N(0,1)
使用逻辑 sigmoid 进行 softmax 归一化:
x_new = 1/(1+exp(-(x-mean)/std))
,以及x_new 在 [0,1]
能量标准化:
x_new = x / ||x||
,使得x_new in [0,1]
(使每个点成为单位向量)< img src="https://i.sstatic.net/GYcu1.png" alt="minmax-standarize-softmax-energy">
As suggested by @daroczig, log-transformation of the data is the way to go. I just wanted to add that there are many types of transformation you can perform.
Perhaps an example might help in this. I will be using the Parallel Coordinates visualization to illustrate the example, but the same concepts should apply for Radar Chart. All experiments are performed in MATLAB.
Consider the Fisher Iris dataset, it contains 150 instances where each point has 4 dimensions. If we add an outlier point outside the range of normal values, we get:
As expected, the plot gets scaled to accommodate the new point, but as a result we loose the detailed view we had before.
The answer is to normalize the data by applying some kind of transformation. The following shows a comparison of four different transformations:
Min/Max normalization:
x_new = (x-min)/(max-min)
, so thatx_new in [0,1]
z-standarization:
x_new = (x-mean)/std
, wherex_new ~ N(0,1)
softmax normalization with logistic sigmoid:
x_new = 1/(1+exp(-(x-mean)/std))
, andx_new in [0,1]
energy normalization:
x_new = x / ||x||
, such thatx_new in [0,1]
(make each point a unit vector)将数据转换为对数刻度不是一种选择吗?
这样,一些极端值就不会扭曲/拥挤其他值。只需计算数组值的常用/自然对数(例如,请参阅w3school 页面),并将它们提供给图表 API。
Transforming your data to logaritmic scale is not an option?
That way a few extreme value would not distort/crowd the other values. Just compute the common/natural logarithm of the values of your array (e.g. see w3school page on it), and feed those to the chart API.