We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.
Closed 7 months ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(9)
平行坐标是可视化高维数据的流行方法。
哪种可视化最适合您的数据,具体取决于其特征——不同维度的相关性如何?
Parallel coordinates are a popular method for visualizing high-dimensional data.
What kind of visualization is best for your data in particular will depend on its characteristics-- how correlated are the different dimensions?
如果维度相关,主成分分析可能会有所帮助。
Principal component analysis could be helpful if the dimensions are correlated.
我要搜索的流行词是多维缩放。它是一种从高维空间到低维空间(2 维或 3 维)的投影技术,使得在整个空间中接近的点在投影中也将接近。
它通常用于可视化聚类算法的输出(即,如果您的聚类在 MDS 投影中很紧凑,那么它们很可能也在整个空间中)。
编辑:这不一定有助于确定数据是密集还是稀疏,因为您会丢失投影中的比例,但它会显示它是均匀的还是块状的(也许这就是您的意思)。
The buzzword I would search for is multidimensional scaling. It is a technique to develop a projection from the high dimensional space to a lower space (2 or 3 dimensional) in such a way that points which are close in the full space will be close in the projection.
It is often used for visualising the output of clustering algorithms (i.e. if your clusters are compact in the MDS projection there is a good chance they are also in the full space).
Edit: This wouldn't necessarily help with determining if the data is dense or sparse, because you lose the scale in the projection, but it would show whether it is uniform or clumpy (perhaps thats what you mean).
不确定您希望从数据中看到什么样的模式。 t-SNE 及其更快的变体 Barnes-Hut-SNE 在可视化高维数据的相关概念组方面做得非常好。它可以通过 R 获得。
有一个关于将其用于大约 300 维的高维数据的简短教程。
http://www.codeproject.com /Tips/788739/使用-T-SNE-wi 可视化高维矢量
Not sure what kind of patterns you would like to see from the data. t-SNE and its faster variant Barnes-Hut-SNE do a very good job in visualizing groups of related concepts for high-dimensional data. It is available through R.
There is a short tutorial on using it against high-dimensional data with about 300 dimensions.
http://www.codeproject.com/Tips/788739/Visualizing-High-Dimensional-Vector-using-T-SNE-wi
我一直在寻找可视化高维数据的方法,并发现了已使用的t-SNE 技术有效地。也可能对其他人有帮助。
I was looking for ways to visualize high dimensional data and found this t-SNE technique that has been used effectively. Might help others as well.
看看http://www.ggobi.org(游览、平行坐标、散点图矩阵)都可以用对于实值变量。还有http://cranvas.org 了解更多最新信息。 R 中的旅游套餐。
Take a look at http://www.ggobi.org (tours, parallel coordinates, scatterplot matrices) can be used for real-valued variables. Also http://cranvas.org for more recent. The tourr package in R.
尝试使用 http://hypertools.readthedocs.io/en/latest/。
HyperTools 是一个用于在 Python 中可视化和操作高维数据的库。
Try using http://hypertools.readthedocs.io/en/latest/.
HyperTools is a library for visualizing and manipulating high-dimensional data in Python.
星型架构。
http://en.wikipedia.org/wiki/Star_schema
适用于高维数据。
如果事实表的基数接近维度大小的乘积,则您拥有密集的数据。
如果事实表的基数小于维度大小的乘积,则数据稀疏。
在中间你需要做出判断。
Star Schema.
http://en.wikipedia.org/wiki/Star_schema
Works well for high-dimensional data.
If the cardinality of your fact table is close to the product of your dimension sizes, you have dense data.
If the cardinality of your fact table is smaller than the product of your dimension sizes, you have sparse data.
In the middle you have a judgement call.
curios.IT 数据探索软件专为高维数据的可视化而设计:数据显示为 3D 对象的集合(每个数据组一个),最多可同时显示 13 个变量。数据变量和视觉特征之间的关系比其他技术(如平行坐标)更容易记住。
The curios.IT data exploration software is designed for the visualization of high dimensional data: data is shown as a collection of 3D objects (one for each data group) which can show up to 13 variables at the same time. The relationships between data variables and visual features are much easier to remember than with other techniques (like parallel coordinates).