如何在 R 树形图中正确着色边缘或绘制矩形?
我使用 R 的 hclust()
生成了此树状图 、 as.dendrogram()
和 plot.dendrogram()
函数。
我使用了dendrapply()
函数和一个本地函数来给叶子着色,效果很好。
我的统计测试结果表明一组节点(例如“_+v\_stat5a\_01_
”和“_+v”的集群) \_stat5b\_01_
”位于树的右下角)是重要的或重要的。
我还有一个本地函数,可以与 dendrapply()
一起使用,它可以找到树图中包含重要叶子的确切节点。
我想要(按照示例):
- 为连接“
_+v\_stat5a\_01_
”和“_+v\_stat5b\_01_
”的边缘着色; 或者, - 在“
_+v\_stat5a\_01_
”和“_+v\_stat5b\_01_
”周围绘制一个rect()
,
我有以下本地函数(“nodes-in-leafList-match-nodes-in-clusterList”条件的详细信息并不重要,但它突出显示了重要节点):
markSignificantClusters <<- function (n) {
if (!is.leaf(n)) {
a <- attributes(n)
leafList <- unlist(dendrapply(n, listLabels))
for (clusterIndex in 1:length(significantClustersList[[1]])) {
clusterList <- unlist(significantClustersList[[1]][clusterIndex])
if (nodes-in-leafList-match-nodes-in-clusterList) {
# I now have a node "n" that contains significant leaves, and
# I'd like to use a dendrapply() call to another local function
# which colors the edges that run down to the leaves; or, draw
# a rect() around the leaves
}
}
}
}
在此 if
块内,我尝试调用 dendrapply(n, markEdges)
,但这不起作用:
markEdges <<- function (n) {
a <- attributes(n)
attr(n, "edgePar") <- c(a$edgePar, list(lty=3, col="red"))
}
在我的理想示例中,连接“_+v\_stat5a\_01_
”和“_+v\_stat5b\_01_
”将为虚线且为红色。
我还尝试在这个 if
块中使用 rect.hclust()
:
ma <- match(leafList, orderedLabels)
rect.hclust(scoreClusterObj, h = a$height, x = c(min(ma), max(ma)), border = 2)
但结果不适用于水平树状图(即具有水平树状图标签)。 这是一个示例(请注意右下角的红色条纹) 。 rect.hclust()
生成的尺寸有些不正确,而且我不知道它是如何工作的,以便能够编写我自己的版本。
我很感激任何关于让 edgePar
或 rect.hclust()
正常工作,或者能够编写我自己的 rect.hclust()
的建议> 等价。
更新
自从提出这个问题以来,我使用getAnywhere(rect.hclust())
来获取计算参数并绘制rect
对象的功能代码。 我编写了此函数的自定义版本来处理水平和垂直叶子,并使用 dendrapply()
调用它。
但是,有某种剪切效果可以删除部分矩形
。 对于水平叶子(绘制在树右侧的叶子),矩形的最右边缘要么消失,要么比矩形其他三边的边框宽度更薄。 对于垂直叶子(绘制在树底部的叶子),矩形的最底部边缘也会遇到相同的显示问题。
作为标记重要簇的一种方法,我所做的就是减小矩形的宽度,以便在簇边缘的尖端和(水平)叶子标签之间渲染垂直的红色条纹。
这消除了剪切问题,但引入了另一个问题,即簇边缘尖端和叶子标签之间的空间只有六个左右像素宽,我对此没有太多控制权。 这限制了垂直条纹的宽度。
更糟糕的问题是,标记两个元素之间垂直条纹的位置的 x
坐标将根据较大树的宽度而变化 (par["usr"]),这又取决于树层次结构最终的结构方式。
我写了一个“修正”,或者更好的说法,一个 hack 来调整这个 x
值和水平树的 rect
宽度。 它并不总是一致地工作,但对于我正在制作的树,它似乎避免太靠近(或重叠)边缘和标签。
最终,更好的解决方法是找出如何绘制矩形
,这样就不会出现裁剪。 或者采用一致的方法来计算任何给定树的树边缘和标签之间的特定 x 位置,以便正确居中和调整条纹大小。
我也对用颜色或线条样式注释边缘的方法非常感兴趣。
I generated this dendrogram using R's hclust()
, as.dendrogram()
and plot.dendrogram()
functions.
I used the dendrapply()
function and a local function to color leaves, which is working fine.
I have results from a statistical test that indicate if a set of nodes (e.g. the cluster of "_+v\_stat5a\_01_
" and "_+v\_stat5b\_01_
" in the lower-right corner of the tree) are significant or important.
I also have a local function that I can use with dendrapply()
that finds the exact node in my dendrogram which contains significant leaves.
I would like to either (following the example):
- Color the edges that join "
_+v\_stat5a\_01_
" and "_+v\_stat5b\_01_
"; or, - Draw a
rect()
around "_+v\_stat5a\_01_
" and "_+v\_stat5b\_01_
"
I have the following local function (the details of the "nodes-in-leafList-match-nodes-in-clusterList" condition aren't important, but that it highlights significant nodes):
markSignificantClusters <<- function (n) {
if (!is.leaf(n)) {
a <- attributes(n)
leafList <- unlist(dendrapply(n, listLabels))
for (clusterIndex in 1:length(significantClustersList[[1]])) {
clusterList <- unlist(significantClustersList[[1]][clusterIndex])
if (nodes-in-leafList-match-nodes-in-clusterList) {
# I now have a node "n" that contains significant leaves, and
# I'd like to use a dendrapply() call to another local function
# which colors the edges that run down to the leaves; or, draw
# a rect() around the leaves
}
}
}
}
From within this if
block, I have tried calling dendrapply(n, markEdges)
, but this did not work:
markEdges <<- function (n) {
a <- attributes(n)
attr(n, "edgePar") <- c(a$edgePar, list(lty=3, col="red"))
}
In my ideal example, the edges connecting "_+v\_stat5a\_01_
" and "_+v\_stat5b\_01_
" would be dashed and of a red color.
I have also tried using rect.hclust()
within this if
block:
ma <- match(leafList, orderedLabels)
rect.hclust(scoreClusterObj, h = a$height, x = c(min(ma), max(ma)), border = 2)
But the result does not work with horizontal dendrograms (i.e. dendrograms with horizontal labels). Here is an example (note the red stripe in the lower-right corner). Something is not correct about the dimensions of what rect.hclust()
generates, and I don't know how it works, to be able to write my own version.
I appreciate any advice for getting edgePar
or rect.hclust()
to work properly, or to be able to write my own rect.hclust()
equivalent.
UPDATE
Since asking this question, I used getAnywhere(rect.hclust())
to get the functional code that calculates parameters and draws the rect
object. I wrote a custom version of this function to handle horizontal and vertical leaves, and call it with dendrapply()
.
However, there is some kind of clipping effect that removes part of the rect
. For horizontal leaves (leaves that are drawn on the right side of the tree), the rightmost edge of the rect
either disappears or is thinner than the border width of the other three sides of the rect
. For vertical leaves (leaves that are drawn on the bottom of the tree), the bottommost edge of the rect
suffers the same display problem.
What I had done as a means of marking significant clusters is to reduce the width of the rect
such that I render a vertical red stripe between the tips of the cluster edges and the (horizontal) leaf labels.
This eliminates the clipping issue, but introduces another problem, in that the space between the cluster edge tips and the leaf labels is only six or so pixels wide, which I don't have much control over. This limits the width of the vertical stripe.
The worse problem is that the x
-coordinate that marks where the vertical stripe can fit between the two elements will change based on the width of the larger tree (par["usr"]
), which in turn depends on how the tree hierarchy ends up being structured.
I wrote a "correction" or, better termed, a hack to adjust this x
value and the rect
width for horizontal trees. It doesn't always work consistently, but for the trees I am making, it seems to keep from getting too close to (or overlapping) edges and labels.
Ultimately, a better fix would be to find out how to draw the rect
so that there is no clipping. Or a consistent way to calculate the specific x
position in between tree edges and labels for any given tree, so as to center and size the stripe properly.
I would also be very interested in a method for annotating edges with colors or line styles.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
所以您实际上问了大约五个问题 (5 +/- 3)。 至于编写您自己的 rect.hclust 之类的函数,如果您想查看源代码,可以在
library/stats/R/identify.hclust.R
中查看。我自己快速浏览了一遍,不确定它是否达到了我在阅读您的描述时所认为的效果——它似乎正在绘制多个矩形,此外,
x
选择器似乎是硬编码的以水平分隔标签(这不是您想要的,并且没有y
)。我会回来的,但与此同时,您可能(除了查看源代码之外)尝试使用不同的
border=
颜色和不同的h=
执行多个 rect.hclust值来查看是否出现故障模式。更新
我对此也没有太多运气。
剪切的一种可能的混乱是用尾随空格填充标签,然后稍微将矩形的边缘带入(其想法是,仅将矩形带入即可将其移出剪切区域,但会覆盖标签的末端)。
另一个想法是用半透明(低 Alpha)颜色填充矩形,形成阴影区域而不是边界框。
So you've actually asked about five questions (5 +/- 3). As far as writing your own rect.hclust like function, the source is in
library/stats/R/identify.hclust.R
if you want to look at it.I took a quick glance at it myself and am not sure it does what I thought it did from reading your description--it seems to be drawing multiple rectangles, Also, the
x
selector appears to be hard coded to segregate the tags horizontally (which isn't what you want and there's noy
).I'll be back, but in the meantime you might (in addition to looking at the source) try doing multiple rect.hclust with different
border=
colors and differenth=
values to see if a failure pattern emerges.Update
I haven't had much luck poking at this either.
One possible kludge for the clipping would be to pad the labels with trailing spaces and then bring the edge of your rectangle in slightly (the idea being that just bringing the rectangle in would get it out of the clipping zone but overwrite the ends of the labels).
Another idea would be to fill the rectangle with a translucent (low alpha) color, making a shaded area rather than a bounding box.