R 中的智能点标签放置

发布于 2024-12-07 13:21:11 字数 1324 浏览 0 评论 0原文

是否有 R 库/函数可以在 R 图中实现智能标签放置？我尝试了一些，但它们都有问题 - 许多标签彼此重叠或与其他点重叠（或图中的其他对象，但我发现这更难处理）。
如果没有，有什么方法可以轻松地帮助算法对特定问题点进行标签放置吗？想要最舒适、最高效的解决方案。

您可以使用我的可重现的示例来尝试和测试其他可能性，看看您是否能够获得比我更好的结果：

# data
x = c(0.8846, 1.1554, 0.9317, 0.9703, 0.9053, 0.9454, 1.0146, 0.9012, 
0.9055, 1.3307)
y = c(0.9828, 1.0329, 0.931, 1.3794, 0.9273, 0.9605, 1.0259, 0.9542, 
0.9717, 0.9357)
ShortSci = c("MotAlb", "PruMod", "EriRub", "LusMeg", "PhoOch", "PhoPho", 
"SaxRub", "TurMer", "TurPil", "TurPhi")

# basic plot
plot(x, y, asp=1)
abline(h = 1, col = "green")
abline(v = 1, col = "green")

对于标签，我然后尝试了这些可能性，没有一个是真正好的：

这个太可怕了：
文本（x，y，标签= ShortSci，cex= 0.7，偏移量= 10）
如果你不想为所有点放置标签，而只想为异常值，但标签仍然经常放置错误：
识别（x，y，标签= ShortSci，cex = 0.7）
这个看起来很有希望，但存在标签太接近目标的问题点；我必须用空格填充它们，但这没有多大帮助：
需要（地图工具） pointLabel（x，y，标签=粘贴（“”，ShortSci，“”，sep =“”），cex = 0.7）
需要（plotrix） thigmophobe.labels(x, y, labels = ShortSci, cex=0.7, offset=0.5)

require(calibrate)
textxy(x, y, labs=ShortSci, cx=0.7)

先感谢您！

编辑：待办事项：尝试labcurve {Hmisc}。

原文

Is there an R library/function that would implement INTELLIGENT label placement in R plot? I tried some but they are all problematic - many labels are overlapping either each other or other points (or other objects in the plot, but I see that this is much harder to handle).
If not, is there any way how to COMFORTABLY help the algorithm with the label placement for particular problematic points? Most comfortable and efficient solution wanted.

You can play and test other possibilities with my reproducible example and see if you are able to achieve better results than I have:

# data
x = c(0.8846, 1.1554, 0.9317, 0.9703, 0.9053, 0.9454, 1.0146, 0.9012, 
0.9055, 1.3307)
y = c(0.9828, 1.0329, 0.931, 1.3794, 0.9273, 0.9605, 1.0259, 0.9542, 
0.9717, 0.9357)
ShortSci = c("MotAlb", "PruMod", "EriRub", "LusMeg", "PhoOch", "PhoPho", 
"SaxRub", "TurMer", "TurPil", "TurPhi")

# basic plot
plot(x, y, asp=1)
abline(h = 1, col = "green")
abline(v = 1, col = "green")

For labelling, I then tried these possibilities, no one is really good:

this one is terrible:
text(x, y, labels = ShortSci, cex= 0.7, offset = 10)
this one is good if you don't want to place labels for all points, but just for the
outliers, but still, the labels are often placed wrong:
identify(x, y, labels = ShortSci, cex = 0.7)
this one looked promissing but there is the problem of labels being too close to the
points; I had to pad them with spaces but this doesn't help much:
require(maptools)
pointLabel(x, y, labels = paste(" ", ShortSci, " ", sep=""), cex=0.7)
require(plotrix)
thigmophobe.labels(x, y, labels = ShortSci, cex=0.7, offset=0.5)

require(calibrate)
textxy(x, y, labs=ShortSci, cx=0.7)

Thank you in advance!

EDIT: todo: try labcurve {Hmisc}.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夏九 2024-12-14 13:21:11

首先，这是我解决此问题的结果：

在此处输入图像描述

我在预览中手动完成了此操作（非常基本）只需几分钟即可在 OS X 上查看 PDF/图像查看器。（编辑：工作流程正是您所期望的：我从 R 将绘图保存为 PDF，在预览中打开它并创建带有所需标签 (9pt Helvetica) 的文本框，然后将它们拖动用我的鼠标四处移动，直到它们看起来不错，然后我导出到 PNG 并上传到 SO。）

寻找算法解决方案完全没问题，而且（恕我直言）真的很有趣。但是，对我来说，点标记情况大致分为三类：

点的数量很少，但没有一个点非常接近。在这种情况下，您在问题中列出的解决方案之一可能只需进行很少的调整即可工作。
您的点数量很少，其中一些点的排列过于紧密，典型的算法解决方案无法给出良好的结果。在这种情况下，由于您只有少量的点，因此手动标记它们（使用图像编辑器或微调对 text 的调用）并不付出很多努力。
您拥有相当多的积分。在这种情况下，无论如何您都不应该给它们添加标签，因为很难直观地处理大量标签。

：爬上肥皂盒：

由于像我们这样的人喜欢自动化，我认为我们经常陷入这样的陷阱：生成良好统计图表的几乎每个方面都应该自动化。我恭敬地（谦虚地！）不同意。

不存在完美通用的统计绘图环境可以自动创建您脑海中的图像。 R、ggplot2、lattice 等工具完成了大部分工作；但额外的一点调整，在这里添加一条线，在那里调整边距，可能更适合不同的工具。

：从肥皂盒中爬下来：

我还想指出，我认为我们都可以想出小于 10-15 点的散点图，即使是手工，也几乎不可能干净地标记这些点，并且这些可能会破坏有人提出的任何自动解决方案跟上。

最后，我想重申，我知道这不是您正在寻找的答案。我并不是说算法尝试是无用或愚蠢的。

我发布这个答案的原因是，我认为这个问题应该是未来重复的规范“R 中的点标签”问题，并且我认为涉及手动标签的解决方案值得在桌面上占有一席之地，仅此而已。

First, here's the results of my solution to this problem:

enter image description here

I did this by hand in Preview (very basic PDF/image viewer on OS X) in just a few minutes. (Edit: The workflow was exactly what you'd expect: I saved the plot as a PDF from R, opened it in Preview and created textboxes with the desired labels (9pt Helvetica) and then just dragged them around with my mouse until they looked good. Then I exported to a PNG for uploading to SO.)

Looking for algorithmic solutions is totally fine, and (IMHO) really interesting. But, to me, point labeling situations fall into roughly three categories:

You have a small number of points, none which are terribly close together. In this case, one of the solutions you listed in the question is likely to work with fairly minimal tweaking.
You have a small number of points, some of which are too closely packed for the typical algorithmic solutions to give good results. In this case, since you only have a small number of points, labeling them by hand (either with an image editor or fine-tuning your call to text) isn't that much effort.
You have a fairly large number of points. In this case, you really shouldn't be labeling them anyway, since it's hard to process large numbers of labels visually.

:climbing onto soapbox:

Since folks like us love automation, I think we often fall into the trap of thinking that nearly every aspect of producing a good statistical graphic ought to be automated. I respectfully (humbly!) disagree.

There is no perfectly general statistical plotting environment that automagically creates the picture you have in your head. Things like R, ggplot2, lattice etc. do most of the work; but that extra little bit of tweaking, adding a line here, adjusting a margin there, is probably better suited to a different tool.

:climbing down from soapbox:

I would also note that I think we could all come up with scatterplots with <10-15 points that will be nearly impossible to cleanly label, even by hand, and these will likely break any automatic solution someone comes up with.

Finally, I want to reiterate that I know this isn't the answer you're looking for. And I'm not saying that algorithmic attempts are useless or dumb.

The reason I posted this answer is that I think this question ought to be the canonical "point labeling in R" question for future duplicates, and I think solutions involving hand-labeling deserve a seat at the table, that's all.

回复收藏 0 原文

长伴 2024-12-14 13:21:11

当应用于 ggplot2 散点图时，ggrepel 看起来很有前途。

# data
x = c(0.8846, 1.1554, 0.9317, 0.9703, 0.9053, 0.9454, 1.0146, 0.9012, 
0.9055, 1.3307)
y = c(0.9828, 1.0329, 0.931, 1.3794, 0.9273, 0.9605, 1.0259, 0.9542, 
0.9717, 0.9357)
ShortSci = c("MotAlb", "PruMod", "EriRub", "LusMeg", "PhoOch", "PhoPho", 
"SaxRub", "TurMer", "TurPil", "TurPhi")


df <- data.frame(x = x, y = y, z = ShortSci)
library(ggplot2)
library(ggrepel)

ggplot(data = df, aes(x = x, y = y)) + theme_bw() + 

    geom_text_repel(aes(label = z), 
       box.padding = unit(0.45, "lines")) +

    geom_point(colour = "green", size = 3)

ggrepel looks promising when applied to ggplot2 scatterplots.

# data
x = c(0.8846, 1.1554, 0.9317, 0.9703, 0.9053, 0.9454, 1.0146, 0.9012, 
0.9055, 1.3307)
y = c(0.9828, 1.0329, 0.931, 1.3794, 0.9273, 0.9605, 1.0259, 0.9542, 
0.9717, 0.9357)
ShortSci = c("MotAlb", "PruMod", "EriRub", "LusMeg", "PhoOch", "PhoPho", 
"SaxRub", "TurMer", "TurPil", "TurPhi")


df <- data.frame(x = x, y = y, z = ShortSci)
library(ggplot2)
library(ggrepel)

ggplot(data = df, aes(x = x, y = y)) + theme_bw() + 

    geom_text_repel(aes(label = z), 
       box.padding = unit(0.45, "lines")) +

    geom_point(colour = "green", size = 3)

回复收藏 0 原文

萌吟 2024-12-14 13:21:11

您是否尝试过 directlabels 包？

而且，顺便说一句，当在几次绘图中存在合理数量的点时， pos 和 offset 参数可以采用向量，以便您可以将它们置于正确的位置。

回复收藏 0 原文

毁虫ゝ 2024-12-14 13:21:11

我找到了一些解决方案！不幸的是，它不是终极和理想的，但它是现在最适合我的。它是半算法、半手动的，因此与 joran 绘制的纯手动解决方案相比，它节省了时间。

我忽略了 ?identify 帮助中非常重要的部分！

用于放置标签的算法与文本使用的算法相同，如果
pos 在那里指定，不同之处在于
相对于识别点的指针决定了identify中的pos。

因此，如果您使用我在问题中所写的 identify() 解决方案，那么您可以通过不直接单击该点，而是通过单击下一步来影响标签的位置到了相对理想的方向！！！效果非常好！

缺点是只有 4 个位置（上、左、下、右），但我更喜欢其他 4 个位置（左上、右上、左下、右下）...所以我使用它来标记不打扰我的点，以及我直接在 Powerpoint 演示文稿中标记的其余点，如 joran 提议的:-)

PS：我还没有尝试过 directlabelslattice/ggplot 解决方案，我仍然更喜欢使用基本绘图库。

回复收藏 0 原文

長街聽風 2024-12-14 13:21:11

我在包 basicPlotteR< 中编写了一个名为 addTextLabels() 的 R 函数/代码>。可以使用以下代码将该包直接安装到 R 库中：

install.packages("devtools")
library("devtools")
install_github("JosephCrispell/basicPlotteR")

对于提供的示例，我使用以下代码生成下面链接的示例图。

# Load the basicPlotteR library
library(basicPlotteR)

# Create vectors storing the X and Y coordinates
x = c(0.8846, 1.1554, 0.9317, 0.9703, 0.9053, 0.9454, 1.0146, 0.9012, 
      0.9055, 1.3307)
y = c(0.9828, 1.0329, 0.931, 1.3794, 0.9273, 0.9605, 1.0259, 0.9542, 
      0.9717, 0.9357)

# Store the labels to be plotted in a vector
ShortSci = c("MotAlb", "PruMod", "EriRub", "LusMeg", "PhoOch", "PhoPho", 
             "SaxRub", "TurMer", "TurPil", "TurPhi")

# Plot the X and Y coordinates without labels
plot(x, y, asp=1)
abline(h = 1, col = "green")
abline(v = 1, col = "green")

# Add non-overlapping text labels
addTextLabels(x, y, ShortSci, cex=0.9, col.background=rgb(0,0,0, 0.75), 
              col.label="white")

它的工作原理是从精细的点网格中自动选择替代位置。首先访问网格上最近的点，如果它们不与任何绘制的点或标签重叠，则选择这些点。如果您是，请查看源代码感兴趣的。

示例图

I've written an R function called addTextLabels() within a package basicPlotteR. The package can be directly installed into your R library using the following code:

install.packages("devtools")
library("devtools")
install_github("JosephCrispell/basicPlotteR")

For the example provided, I used the following code to generate the example figure linked below.

# Load the basicPlotteR library
library(basicPlotteR)

# Create vectors storing the X and Y coordinates
x = c(0.8846, 1.1554, 0.9317, 0.9703, 0.9053, 0.9454, 1.0146, 0.9012, 
      0.9055, 1.3307)
y = c(0.9828, 1.0329, 0.931, 1.3794, 0.9273, 0.9605, 1.0259, 0.9542, 
      0.9717, 0.9357)

# Store the labels to be plotted in a vector
ShortSci = c("MotAlb", "PruMod", "EriRub", "LusMeg", "PhoOch", "PhoPho", 
             "SaxRub", "TurMer", "TurPil", "TurPhi")

# Plot the X and Y coordinates without labels
plot(x, y, asp=1)
abline(h = 1, col = "green")
abline(v = 1, col = "green")

# Add non-overlapping text labels
addTextLabels(x, y, ShortSci, cex=0.9, col.background=rgb(0,0,0, 0.75), 
              col.label="white")

It works by automatically selecting an alternative location from a fine grid of points. The closest points on the grid are visited first and selected if they don't overlap with any plotted points or labels. Take a look at the source code, if you're interested.

Example Figure

回复收藏 0 原文

不美如何 2024-12-14 13:21:11

我建议您看一下 wordcloud 包。我知道这个包的重点并不完全是重点，而是标签本身，而且风格似乎相当固定。但尽管如此，我使用它所得到的结果还是相当惊人的。另请注意，有问题的软件包版本是在您提出问题时发布的，因此它仍然非常新。

http://blog.fellstat.com/?cat=11

回复收藏 0 原文

胡渣熟男 2024-12-14 13:21:11

不是答案，但评论太长了。一种非常简单的方法可以处理简单的情况，介于 joran 的后处理和已经提出的更复杂的算法之间，就是对数据帧进行就地简单转换。

我用 ggplot2 来说明这一点，因为我比基本 R 绘图更熟悉该语法。

df <- data.frame(x = x, y = y, z = ShortSci)
library("ggplot2")
ggplot(data = df, aes(x = x, y = y, label = z)) + theme_bw() + 
    geom_point(shape = 1, colour = "green", size = 5) + 
    geom_text(data = within(df, c(y <- y+.01, x <- x-.01)), hjust = 0, vjust = 0)

正如您所看到的，在这种情况下，结果并不理想，但对于某些目的来说可能已经足够好了。而且它非常轻松，通常这样的东西就足够 within(df, y <- y+.01)

在此输入图像描述

Not an answer, but too long for a comment. A very simple approach that can work on simple cases, somewhere between joran's post-processing and the more sophisticated algorithms that have been presented is to make in-place simple transformations to the dataframe.

I illustrate this with ggplot2 because I'm more familiar with that syntax than base R plots.

df <- data.frame(x = x, y = y, z = ShortSci)
library("ggplot2")
ggplot(data = df, aes(x = x, y = y, label = z)) + theme_bw() + 
    geom_point(shape = 1, colour = "green", size = 5) + 
    geom_text(data = within(df, c(y <- y+.01, x <- x-.01)), hjust = 0, vjust = 0)

As you can see, in this instance the result is not ideal, but it may be good enough for some purposes. And it is quite effortless, typically something like this is enough within(df, y <- y+.01)

enter image description here

回复收藏 0 原文

~没有更多了~