如何可视化用户集群?
我有一个用户可以在其中相互交互的应用程序。 我想可视化这些交互,以便我可以确定是否存在用户集群(其中交互更频繁)。
我为每个用户分配了一个 2D 点(其中每个坐标都在 0 和 1 之间)。 我的想法是,两个用户交互时,他们的点会靠得更近,形成一种“吸引力”,而我只是一遍又一遍地重复查看我的交互日志。
当然,我需要一种“排斥力”来将用户分开,否则他们都会崩溃成一个点。
首先,我尝试监控每个 XY 坐标的最低和最高值,并标准化它们的位置,但这不起作用,一些交互次数较少的用户停留在边缘,其余的都崩溃到中间。
有谁知道我应该使用什么方程来移动这些点,既用于用户交互时之间的“吸引力”力,又用于阻止它们全部塌陷成一个点的“排斥力”?
编辑:在回答问题时,我应该指出,我正在处理大约 100 万用户,以及用户之间大约 1000 万次交互。 如果有人可以推荐一个可以为我做到这一点的工具,我洗耳恭听:-)
I have an application in which users interact with each-other. I want to visualize these interactions so that I can determine whether clusters of users exist (within which interactions are more frequent).
I've assigned a 2D point to each user (where each coordinate is between 0 and 1). My idea is that two users' points move closer together when they interact, an "attractive force", and I just repeatedly go through my interaction logs over and over again.
Of course, I need a "repulsive force" that will push users apart too, otherwise they will all just collapse into a single point.
First I tried monitoring the lowest and highest of each of the XY coordinates, and normalizing their positions, but this didn't work, a few users with a small number of interactions stayed at the edges, and the rest all collapsed into the middle.
Does anyone know what equations I should use to move the points, both for the "attractive" force between users when they interact, and a "repulsive" force to stop them all collapsing into a single point?
Edit: In response to a question, I should point out that I'm dealing with about 1 million users, and about 10 million interactions between users. If anyone can recommend a tool that could do this for me, I'm all ears :-)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我可以推荐一些可能性:首先,尝试对交互进行对数缩放或通过 sigmoidal 函数运行它们以压缩范围。 这将为您提供更平滑的视觉间距分布。
与此缩放问题无关:查看 graphviz 中的一些渲染策略,特别是程序“neato”和“fdp”。 从手册页:
最后,考虑一种缩放策略、吸引力和某种阻力系数而不是排斥力。 实际上,将事物移近和,然后可能稍后移得更远,可能只会导致你的循环行为。
考虑一个模型,其中所有内容最终都会崩溃,但速度很慢。 然后运行直到满足某些条件(节点穿过布局区域的中心或类似的条件)。
阻力或动量可以被编码为对运动的基本阻力,相当于限制运动; 它可以有差别地应用(事物可以根据它们走了多远、它们在空间中的位置、有多少其他节点靠近等而移动得更慢)。
希望这可以帮助。
I can recommend some possibilities: first, try log-scaling the interactions or running them through a sigmoidal function to squash the range. This will give you a smoother visual distribution of spacing.
Independent of this scaling issue: look at some of the rendering strategies in graphviz, particularly the programs "neato" and "fdp". From the man page:
Finally, consider one of the scaling strategies, an attractive force, and some sort of drag coefficient instead of a repulsive force. Actually moving things closer and then possibly farther later on may just get you cyclic behavior.
Consider a model in which everything will collapse eventually, but slowly. Then just run until some condition is met (a node crosses the center of the layout region or some such).
Drag or momentum can just be encoded as a basic resistance to motion and amount to throttling the movements; it can be applied differentially (things can move slower based on how far they've gone, where they are in space, how many other nodes are close, etc.).
Hope this helps.
弹簧模型是实现此目的的传统方法:根据相互作用在每个节点之间产生吸引力,并根据距离的平方反比在所有节点之间产生排斥力。 然后求解,最小化能量。 如果您有多个节点,您可能需要一些相当强大的编程才能获得有效的解决方案。 确保起始位置是随机的,并多次运行程序:像这样的情况几乎总是有几个局部能量最小值,并且您要确保拥有一个好的能量最小值。
另外,除非您只有几个节点,否则我会在 3D 中执行此操作。 额外的自由维度可以提供更好的解决方案,并且即使不比 2D 更好,您也应该能够以 3D 形式可视化簇。
The spring model is the traditional way to do this: make an attractive force between each node based on the interaction, and a repulsive force between all nodes based on the inverse square of their distance. Then solve, minimizing the energy. You may need some fairly high powered programming to get an efficient solution to this if you have more than a few nodes. Make sure the start positions are random, and run the program several times: a case like this almost always has several local energy minima in it, and you want to make sure you've got a good one.
Also, unless you have only a few nodes, I would do this in 3D. An extra dimension of freedom allows for better solutions, and you should be able to visualize clusters in 3D as well if not better than 2D.
过去,当我尝试这种事情时,我使用弹簧模型将链接的节点拉在一起,例如:
dx = -k*(xl)
。dx
是位置的变化,x
是当前位置,l
是所需的间隔,k
是您调整的弹簧系数,直到您在弹簧强度和稳定性之间获得良好的平衡,它将小于 0.1。 有l > 0
确保一切都不会在中间结束。除此之外,所有节点之间的一般“排斥”力会将它们分散开,类似于:
dx = k / x^2
。 两个节点距离越近,该值就会越大,调整k
以获得合理的效果。In the past, when I've tried this kind of thing, I've used a spring model to pull linked nodes together, something like:
dx = -k*(x-l)
.dx
is the change in the position,x
is the current position,l
is the desired separation, andk
is the spring coefficient that you tweak until you get a nice balance between spring strength and stability, it'll be less than 0.1. Havingl > 0
ensures that everything doesn't end up in the middle.In addition to that, a general "repulsive" force between all nodes will spread them out, something like:
dx = k / x^2
. This will be larger the closer two nodes are, tweakk
to get a reasonable effect.