绘制嵌套模型
在数据集上运行例如 cv.glmnet 会给我(默认情况下)100 个不同的模型。现在,如果我的数据集缺少数据,我可以进行多次插补(例如 10 次插补)并对每个插补运行 cv.glmnet。
如果我忽略每个模型的实际系数值,而只查看选定的特征(即列名称集),则某些模型是其他模型的子模型。
像这样的代码在某种程度上模仿了结果:
usevars<-paste("var", 1:100, sep="")
mdls<-replicate(1000, {
numVars<-sample.int(length(usevars), 1)
sample(usevars, numVars)
})
names(mdls)<-paste("mdl", 1:1000, sep="")
现在,在这方面很容易获得子模型的父子关系。也可以仅包括“直接亲子关系”(即,如果模型 A 是 B 的孩子,B 是 C 的孩子,则不包括 A 和 C 之间的关系)。
最后,我遇到了我的问题:我使用 igraph 来绘制这些模型及其(直接)关系。然而,我没有找到可以根据另一个变量(在本例中为模型大小)对节点进行分组的布局:在这种设置中,创建此图表似乎是一个好主意,该图表包含具有相同模型的模型的“带”大小(模型中的变量数量)。
我最终所做的,或多或少是通过一堆代码自己计算每个节点的位置(我很不好意思在这里发布),但我总是想知道我是否只是错过了一个更好的/出局-即用型解决方案。
我自己的代码生成了这样的图表(您可以忽略颜色和标签 - 只需知道水平轴保存模型大小即可):
与我自己完成所有艰苦工作相比,更优雅地实现此类图表的建议非常重要赞赏。
Running e.g. cv.glmnet on a dataset gives me (by default) 100 different models. Now, if my dataset had missing data, I could do multiple imputation (say 10 imputations) and run cv.glmnet on each of the imputations.
If I disregard the actual coefficient values for each of the models, and just look at the selected features (i.e. sets of column names), some models are submodels of others.
Code like this imitates the results somewhat:
usevars<-paste("var", 1:100, sep="")
mdls<-replicate(1000, {
numVars<-sample.int(length(usevars), 1)
sample(usevars, numVars)
})
names(mdls)<-paste("mdl", 1:1000, sep="")
Now, it's easy enough to get the parent-child relations for submodels in this respect. It is also possible to only include 'direct parenthood' (i.e. if model A is child of B and B is child of C, then don't include the relation between A and C).
Finally, I come to my problem: I've used igraph to plot these models and their (direct) relations. I did not, however, find a layout that could group the nodes based on another variable (in this case the model size): in this setting it seems like a good idea to create this graph holding 'bands' of models with the same model size (number of variables in the model).
What I ended up doing, was more or less calculate the positions of each node myself through a kludge of code (that I'm too embarassed about to be posting here), but I always kept wondering if I simply missed a better / out-of-the-box solution.
My own code resulted in graphs like this one (you can ignore the colours and the labels - just know that the horizontal axis holds the model size):
Suggestions for achieving this sort of graph more elegantly than, well, doing all the hard work myself, are greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
igraph开发版(即0.6,尚未正式发布,但你可以在邮件列表 向您发送副本)有两个隐藏(即尚未记录)参数:
miny
和maxy
。它们允许您将节点的 Y 坐标限制在一定范围内,因此您可以使用它来创建图层。或者,我现在正在为 igraph 实现 Sugiyama 分层图形布局方法,我将在一两天内将其合并到开发树中(如果一切顺利的话),然后您可以尝试一下。
The Fruchterman-Reingold layout algorithm in the development version of igraph (that is, 0.6, which is not released officially yet, but you can ask Gábor on the mailing list to send you a copy) has two hidden (i.e. yet undocumented) parameters:
miny
andmaxy
. They allow you to constrain the Y coordinates of nodes within a range, so you can use this to create layers.Alternatively, I'm working on an implementation of the Sugiyama layered graph layout method for igraph right now and I will merge it to the development tree in a day or two (if things go well), and then you can try that.
您可以使用该选项来约束 qgraph 中的 fruchterman-reingold 算法来执行此操作。为了展示这一点,我首先创建一个嵌套模型的小型邻接矩阵:
这里
adj
是邻接矩阵,mods
是一个包含模型级别的向量(嵌套的距离) )。在
qgraph
中,您可以在邻接矩阵上使用qgraph()
函数绘制邻接矩阵的图。通过设置参数layout="spring"
您可以调用 Fruchterman-Reingold 算法,并使用layout.par
您可以为 Fruchterman-Reingold 提供参数列表。通过参数
constraints
,我们可以对布局设置约束。这必须是一个包含 2 列和每个节点一行的矩阵。每行的第一个元素是 x 坐标,第二个元素是 y 坐标。如果它包含 NA,则意味着该坐标可以自由移动,如果这是一个值,则意味着该坐标固定在某个位置。你必须在 y 位置的范围内尝试不同的事情,看看什么最有效。这里我只是将
mod
向量乘以节点数,就得到了一个好看的图:这里我们还将布局保存在一个对象
L
中,它可以用作布局在 igraph 中也是如此。You can use the option to constrain the fruchterman-reingold algorithm in qgraph to do this. To show this I first create a small adjacency matrix of nested models:
Here
adj
is the adjacency matrix andmods
a vector containing the level of the model (how far it is nested).In
qgraph
You can plot the graph of an adjacency matrix using theqgraph()
function on the adjacency matrix. By setting the argumentlayout="spring"
you call the Fruchterman-Reingold algortihm, and withlayout.par
you can supply a list of parameters for Fruchterman-Reingold.With the parameter
constraints
we can set constraints to the layout. This must be a matrix of 2 columns and a row for each node. The first element of each row is the x-coordinate and the second the y-coordinate. If this contains NA it means that that coordinate is free to move, and if this is a value it means that that coordinate is fixed to a certain location.You'd have to try different things on the scale of the y positions to see what works best. Here I just multiply the
mod
vector by the number of nodes to get a good looking graph:Here we also saved the layout in an object
L
, which can be used as layout inigraph
as well.