R 中的智能点标签放置
是否有 R 库/函数可以在 R 图中实现智能标签放置?我尝试了一些,但它们都有问题 - 许多标签彼此重叠或与其他点重叠(或图中的其他对象,但我发现这更难处理)。
如果没有,有什么方法可以轻松地帮助算法对特定问题点进行标签放置吗?想要最舒适、最高效的解决方案。
您可以使用我的可重现的示例来尝试和测试其他可能性,看看您是否能够获得比我更好的结果:
# data
x = c(0.8846, 1.1554, 0.9317, 0.9703, 0.9053, 0.9454, 1.0146, 0.9012,
0.9055, 1.3307)
y = c(0.9828, 1.0329, 0.931, 1.3794, 0.9273, 0.9605, 1.0259, 0.9542,
0.9717, 0.9357)
ShortSci = c("MotAlb", "PruMod", "EriRub", "LusMeg", "PhoOch", "PhoPho",
"SaxRub", "TurMer", "TurPil", "TurPhi")
# basic plot
plot(x, y, asp=1)
abline(h = 1, col = "green")
abline(v = 1, col = "green")
对于标签,我然后尝试了这些可能性,没有一个是真正好的:
这个太可怕了:
文本(x,y,标签= ShortSci,cex= 0.7,偏移量= 10)
如果你不想为所有点放置标签,而只想为 异常值,但标签仍然经常放置错误:
识别(x,y,标签= ShortSci,cex = 0.7)
这个看起来很有希望,但存在标签太接近目标的问题 点;我必须用空格填充它们,但这没有多大帮助:
需要(地图工具) pointLabel(x,y,标签=粘贴(“”,ShortSci,“”,sep =“”),cex = 0.7)
需要(plotrix) thigmophobe.labels(x, y, labels = ShortSci, cex=0.7, offset=0.5)
require(calibrate)
textxy(x, y, labs=ShortSci, cx=0.7)
先感谢您!
编辑:待办事项:尝试labcurve {Hmisc}。
Is there an R library/function that would implement INTELLIGENT label placement in R plot? I tried some but they are all problematic - many labels are overlapping either each other or other points (or other objects in the plot, but I see that this is much harder to handle).
If not, is there any way how to COMFORTABLY help the algorithm with the label placement for particular problematic points? Most comfortable and efficient solution wanted.
You can play and test other possibilities with my reproducible example and see if you are able to achieve better results than I have:
# data
x = c(0.8846, 1.1554, 0.9317, 0.9703, 0.9053, 0.9454, 1.0146, 0.9012,
0.9055, 1.3307)
y = c(0.9828, 1.0329, 0.931, 1.3794, 0.9273, 0.9605, 1.0259, 0.9542,
0.9717, 0.9357)
ShortSci = c("MotAlb", "PruMod", "EriRub", "LusMeg", "PhoOch", "PhoPho",
"SaxRub", "TurMer", "TurPil", "TurPhi")
# basic plot
plot(x, y, asp=1)
abline(h = 1, col = "green")
abline(v = 1, col = "green")
For labelling, I then tried these possibilities, no one is really good:
this one is terrible:
text(x, y, labels = ShortSci, cex= 0.7, offset = 10)
this one is good if you don't want to place labels for all points, but just for the
outliers, but still, the labels are often placed wrong:identify(x, y, labels = ShortSci, cex = 0.7)
this one looked promissing but there is the problem of labels being too close to the
points; I had to pad them with spaces but this doesn't help much:require(maptools)
pointLabel(x, y, labels = paste(" ", ShortSci, " ", sep=""), cex=0.7)require(plotrix)
thigmophobe.labels(x, y, labels = ShortSci, cex=0.7, offset=0.5)
require(calibrate)
textxy(x, y, labs=ShortSci, cx=0.7)
Thank you in advance!
EDIT: todo: try labcurve {Hmisc}.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
首先,这是我解决此问题的结果:
我在预览中手动完成了此操作(非常基本)只需几分钟即可在 OS X 上查看 PDF/图像查看器。 (编辑:工作流程正是您所期望的:我从 R 将绘图保存为 PDF,在预览中打开它并创建带有所需标签 (9pt Helvetica) 的文本框,然后将它们拖动用我的鼠标四处移动,直到它们看起来不错,然后我导出到 PNG 并上传到 SO。)
寻找算法解决方案完全没问题,而且(恕我直言)真的很有趣。但是,对我来说,点标记情况大致分为三类:
text
的调用)并不 付出很多努力。:爬上肥皂盒:
由于像我们这样的人喜欢自动化,我认为我们经常陷入这样的陷阱:生成良好统计图表的几乎每个方面都应该自动化。我恭敬地(谦虚地!)不同意。
不存在完美通用的统计绘图环境可以自动创建您脑海中的图像。 R、ggplot2、lattice 等工具完成了大部分工作;但额外的一点调整,在这里添加一条线,在那里调整边距,可能更适合不同的工具。
:从肥皂盒中爬下来:
我还想指出,我认为我们都可以想出小于 10-15 点的散点图,即使是手工,也几乎不可能干净地标记这些点,并且这些可能会破坏有人提出的任何自动解决方案跟上。
最后,我想重申,我知道这不是您正在寻找的答案。我并不是说算法尝试是无用或愚蠢的。
我发布这个答案的原因是,我认为这个问题应该是未来重复的规范“R 中的点标签”问题,并且我认为涉及手动标签的解决方案值得在桌面上占有一席之地,仅此而已。
First, here's the results of my solution to this problem:
I did this by hand in Preview (very basic PDF/image viewer on OS X) in just a few minutes. (Edit: The workflow was exactly what you'd expect: I saved the plot as a PDF from R, opened it in Preview and created textboxes with the desired labels (9pt Helvetica) and then just dragged them around with my mouse until they looked good. Then I exported to a PNG for uploading to SO.)
Looking for algorithmic solutions is totally fine, and (IMHO) really interesting. But, to me, point labeling situations fall into roughly three categories:
text
) isn't that much effort.:climbing onto soapbox:
Since folks like us love automation, I think we often fall into the trap of thinking that nearly every aspect of producing a good statistical graphic ought to be automated. I respectfully (humbly!) disagree.
There is no perfectly general statistical plotting environment that automagically creates the picture you have in your head. Things like R, ggplot2, lattice etc. do most of the work; but that extra little bit of tweaking, adding a line here, adjusting a margin there, is probably better suited to a different tool.
:climbing down from soapbox:
I would also note that I think we could all come up with scatterplots with <10-15 points that will be nearly impossible to cleanly label, even by hand, and these will likely break any automatic solution someone comes up with.
Finally, I want to reiterate that I know this isn't the answer you're looking for. And I'm not saying that algorithmic attempts are useless or dumb.
The reason I posted this answer is that I think this question ought to be the canonical "point labeling in R" question for future duplicates, and I think solutions involving hand-labeling deserve a seat at the table, that's all.
当应用于 ggplot2 散点图时,ggrepel 看起来很有前途。
ggrepel
looks promising when applied toggplot2
scatterplots.您是否尝试过 directlabels 包?
而且,顺便说一句,当在几次绘图中存在合理数量的点时, pos 和 offset 参数可以采用向量,以便您可以将它们置于正确的位置。
Have you tried the directlabels package?
And, BTW, the pos and offset arguments can take vectors to allow you to get them in the right positions when there are a reasonable number of points in just a few runs of plot.
我找到了一些解决方案!不幸的是,它不是终极和理想的,但它是现在最适合我的。它是半算法、半手动的,因此与 joran 绘制的纯手动解决方案相比,它节省了时间。
我忽略了
?identify
帮助中非常重要的部分!因此,如果您使用我在问题中所写的
identify()
解决方案,那么您可以通过不直接单击该点,而是通过单击下一步来影响标签的位置到了相对理想的方向!!!效果非常好!缺点是只有 4 个位置(上、左、下、右),但我更喜欢其他 4 个位置(左上、右上、左下、右下)...所以我使用它来标记不打扰我的点,以及我直接在 Powerpoint 演示文稿中标记的其余点,如 joran 提议的:-)
PS:我还没有尝试过 directlabelslattice/ggplot 解决方案,我仍然更喜欢使用基本绘图库。
I found some solution! It's not ultimate and ideal unfortunatelly, but it's the one that works the best for me now. It's half algoritmic, half manual, so it saves time compared to pure manual solution sketched by joran.
I overlooked very important part of the
?identify
help!So if you use the
identify()
solution as I wrote in my question, then you can affect the position of the label by not clicking directly on that point, but by clicking next to that point relatively in the desired direction!!! Works just great!The downside is that there are only 4 positions (top, left, bottom, right), but I'd more appreciate the other 4 (top-left, top-right, bottom-left, bottom-right)... So I use this to labels points where it doesn't bother me and the rest of the points I label directly in my Powerpoint presentation, as joran proposed :-)
P.S.: I haven't tried the directlabels lattice/ggplot solution yet, I still prefer to use the basic plot library.
我在包
basicPlotteR< 中编写了一个名为
addTextLabels()
的 R 函数/代码>。可以使用以下代码将该包直接安装到 R 库中:
对于提供的示例,我使用以下代码生成下面链接的示例图。
它的工作原理是从精细的点网格中自动选择替代位置。首先访问网格上最近的点,如果它们不与任何绘制的点或标签重叠,则选择这些点。如果您是,请查看源代码感兴趣的。
I've written an R function called
addTextLabels()
within a packagebasicPlotteR
. The package can be directly installed into your R library using the following code:For the example provided, I used the following code to generate the example figure linked below.
It works by automatically selecting an alternative location from a fine grid of points. The closest points on the grid are visited first and selected if they don't overlap with any plotted points or labels. Take a look at the source code, if you're interested.
我建议您看一下
wordcloud
包。我知道这个包的重点并不完全是重点,而是标签本身,而且风格似乎相当固定。但尽管如此,我使用它所得到的结果还是相当惊人的。另请注意,有问题的软件包版本是在您提出问题时发布的,因此它仍然非常新。http://blog.fellstat.com/?cat=11
I'd suggest you take a look at the
wordcloud
package. I know this package focuses not exactly on the points but on the labels themselves, and also the style seems to be rather fixed. But still, the results I got from using it were pretty stunning. Also note that the package version in question was released about the time you asked the question, so it's still very new.http://blog.fellstat.com/?cat=11
不是答案,但评论太长了。一种非常简单的方法可以处理简单的情况,介于 joran 的后处理和已经提出的更复杂的算法之间,就是对数据帧进行
就地
简单转换。我用 ggplot2 来说明这一点,因为我比基本 R 绘图更熟悉该语法。
正如您所看到的,在这种情况下,结果并不理想,但对于某些目的来说可能已经足够好了。而且它非常轻松,通常这样的东西就足够
within(df, y <- y+.01)
Not an answer, but too long for a comment. A very simple approach that can work on simple cases, somewhere between joran's post-processing and the more sophisticated algorithms that have been presented is to make
in-place
simple transformations to the dataframe.I illustrate this with
ggplot2
because I'm more familiar with that syntax than base R plots.As you can see, in this instance the result is not ideal, but it may be good enough for some purposes. And it is quite effortless, typically something like this is enough
within(df, y <- y+.01)