R 中的最大情节？

发布于 2024-08-16 06:17:33 字数 690 浏览 2 评论 0原文

我遇到过很多情况，我想要绘制比实际应该绘制的更多的点 - 主要的障碍是当我与人们分享我的图或将它们嵌入论文中时，它们占用了太多空间。在数据框中随机采样行非常简单。

如果我想要一个真正随机的点图样本，很容易说：

ggplot(x,y,data=myDf[sample(1:nrow(myDf),1000),])

但是，我想知道是否有更有效（最好是固定的）方法来指定绘图点的数量，以便您的实际数据准确地反映在图中。这是一个例子。假设我正在绘制类似重尾分布的 CCDF 的图，例如，

ccdf <- function(myList,density=FALSE)
{
  # generates the CCDF of a list or vector
  freqs = table(myList)
  X = rev(as.numeric(names(freqs)))
  Y =cumsum(rev(as.list(freqs)));
  data.frame(x=X,count=Y)
}
qplot(x,count,data=ccdf(rlnorm(10000,3,2.4)),log='xy')

这将生成一个图，其中 x & y 轴变得越来越密集。在这里，对于较大的 x 或 y 值绘制较少的样本是理想的。

有人对处理类似问题有任何提示或建议吗？

谢谢， -e

原文

I have come across a number of situations where I want to plot more points than I really ought to be -- the main holdup is that when I share my plots with people or embed them in papers, they occupy too much space. It's very straightforward to randomly sample rows in a dataframe.

if I want a truly random sample for a point plot, it's easy to say:

ggplot(x,y,data=myDf[sample(1:nrow(myDf),1000),])

However, I was wondering if there were more effective (ideally canned) ways to specify the number of plot points such that your actual data is accurately reflected in the plot. So here is an example.
Suppose I am plotting something like the CCDF of a heavy tailed distribution, e.g.

ccdf <- function(myList,density=FALSE)
{
  # generates the CCDF of a list or vector
  freqs = table(myList)
  X = rev(as.numeric(names(freqs)))
  Y =cumsum(rev(as.list(freqs)));
  data.frame(x=X,count=Y)
}
qplot(x,count,data=ccdf(rlnorm(10000,3,2.4)),log='xy')

This will produce a plot where the x & y axis become increasingly dense. Here it would be ideal to have fewer samples plotted for large x or y values.

Does anybody have any tips or suggestions for dealing with similar issues?

Thanks,
-e

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

压抑⊿情绪 2024-08-23 06:17:33

对于这种情况，我倾向于使用 png 文件，而不是基于矢量的图形，例如 pdf 或 eps。尽管分辨率会降低，但文件要小得多。

如果它是更传统的散点图，那么使用半透明颜色也有帮助，并且可以解决过度绘制的问题。例如，

x <- rnorm(10000); y <- rnorm(10000)
qplot(x, y, colour=I(alpha("blue",1/25)))

I tend to use png files rather than vector based graphics such as pdf or eps for this situation. The files are much smaller, although you lose resolution.

If it's a more conventional scatterplot, then using semi-transparent colours also helps, as well as solving the over-plotting problem. For example,

x <- rnorm(10000); y <- rnorm(10000)
qplot(x, y, colour=I(alpha("blue",1/25)))

回复收藏 0 原文

成熟的代价 2024-08-23 06:17:33

除了 Rob 的建议之外，我喜欢的一个绘图函数是 hexbin，因为它可以为您“细化” ；例如R Graph Gallery。

回复收藏 0 原文

幻梦 2024-08-23 06:17:33

如果对数转换，这是相对于 x 轴下采样图的一种可能的解决方案。它对 x 轴进行对数变换，对该数量进行舍入，并选取该 bin 中的中值 x 值：

downsampled_qplot <- function(x,y,data,rounding=0, ...) {
  # assumes we are doing log=xy or log=x
  group = factor(round(log(data$x),rounding))
  d <- do.call(rbind, by(data, group, 
    function(X) X[order(X$x)[floor(length(X)/2)],]))
  qplot(x,count,data=d, ...)
}

使用上面的 ccdf() 定义，我们可以比较 CCDF 的原始图下采样版本的分布：

myccdf=ccdf(rlnorm(10000,3,2.4))

qplot(x,count,data=myccdf,log='xy',main='original')

downsampled_qplot(x,count,data=myccdf,log='xy',rounding=1,main='rounding = 1')

downsampled_qplot(x,count,data=myccdf,log='xy',rounding=0,main='rounding = 0')

PDF格式中，原图占用640K，降采样版本分别占用20K和8K 。

Here is one possible solution for downsampling plot with respect to the x-axis, if it is log transformed. It log transforms the x-axis, rounds that quantity, and picks the median x value in that bin:

downsampled_qplot <- function(x,y,data,rounding=0, ...) {
  # assumes we are doing log=xy or log=x
  group = factor(round(log(data$x),rounding))
  d <- do.call(rbind, by(data, group, 
    function(X) X[order(X$x)[floor(length(X)/2)],]))
  qplot(x,count,data=d, ...)
}

Using the definition of ccdf() from above, we can then compare the original plot of the CCDF of the distribution with the downsampled version:

myccdf=ccdf(rlnorm(10000,3,2.4))

qplot(x,count,data=myccdf,log='xy',main='original')

downsampled_qplot(x,count,data=myccdf,log='xy',rounding=1,main='rounding = 1')

downsampled_qplot(x,count,data=myccdf,log='xy',rounding=0,main='rounding = 0')

In PDF format, the original plot takes up 640K, and the downsampled versions occupy 20K and 8K, respectively.

回复收藏 0 原文

感受沵的脚步 2024-08-23 06:17:33

我要么按照 Rob 已经提到的方式制作图像文件（png 或 jpeg 设备），要么制作2D 直方图。 2D 直方图的替代方案是平滑散点图，它制作了类似的图形，但从密集到密集的截止更平滑空间的稀疏区域。

如果您之前从未见过addictedtor，那么值得一看。它有一些用 R 生成的非常漂亮的图形，包含图像和示例代码。

以下是来自 addictedtor 站点的示例代码：

二维直方图：

require(gplots) 

# example data, bivariate normal, no correlation
x <- rnorm(2000, sd=4) 
y <- rnorm(2000, sd=1) 

# separate scales for each axis, this looks circular
hist2d(x,y, nbins=50, col = c("white",heat.colors(16))) 
rug(x,side=1) 
rug(y,side=2) 
box()

smoothscatter：

library("geneplotter")  ## from BioConductor
require("RColorBrewer") ## from CRAN

x1  <- matrix(rnorm(1e4), ncol=2)
x2  <- matrix(rnorm(1e4, mean=3, sd=1.5), ncol=2)
x   <- rbind(x1,x2)

layout(matrix(1:4, ncol=2, byrow=TRUE))
op <- par(mar=rep(2,4))
smoothScatter(x, nrpoints=0)
smoothScatter(x)
smoothScatter(x, nrpoints=Inf,
              colramp=colorRampPalette(brewer.pal(9,"YlOrRd")),
              bandwidth=40)
colors  <- densCols(x)
plot(x, col=colors, pch=20)

par(op)

I'd either make image files (png or jpeg devices) as Rob already mentioned, or I'd make a 2D histogram. An alternative to the 2D histogram is a smoothed scatterplot, it makes a similar graphic but has a more smooth cutoff from dense to sparse regions of space.

If you've never seen addictedtor before, it's worth a look. It has some very nice graphics generated in R with images and sample code.

Here's the sample code from the addictedtor site:

2-d histogram:

require(gplots) 

# example data, bivariate normal, no correlation
x <- rnorm(2000, sd=4) 
y <- rnorm(2000, sd=1) 

# separate scales for each axis, this looks circular
hist2d(x,y, nbins=50, col = c("white",heat.colors(16))) 
rug(x,side=1) 
rug(y,side=2) 
box()

smoothscatter:

library("geneplotter")  ## from BioConductor
require("RColorBrewer") ## from CRAN

x1  <- matrix(rnorm(1e4), ncol=2)
x2  <- matrix(rnorm(1e4, mean=3, sd=1.5), ncol=2)
x   <- rbind(x1,x2)

layout(matrix(1:4, ncol=2, byrow=TRUE))
op <- par(mar=rep(2,4))
smoothScatter(x, nrpoints=0)
smoothScatter(x)
smoothScatter(x, nrpoints=Inf,
              colramp=colorRampPalette(brewer.pal(9,"YlOrRd")),
              bandwidth=40)
colors  <- densCols(x)
plot(x, col=colors, pch=20)

par(op)

回复收藏 0 原文

~没有更多了~