ggplot 和 pgfSweave 的问题

发布于 2024-10-03 01:11:02 字数 2661 浏览 10 评论 0原文

我不久前开始使用 Sweave。然而，像大多数人一样，我很快就遇到了一个主要问题：速度。编辑一个大文档需要很长时间才能运行，这使得高效工作变得相当具有挑战性。使用cacheSweave 可以大大加速数据处理。然而，绘图 - 特别是 ggplot ;) - 仍然需要很长时间才能渲染。这就是我想使用 pgfSweave 的方式。

经过很多很多个小时，我终于成功地用 Eclipse/StatET/Texlipse 建立了一个工作系统。然后我想转换现有的报告以与 pgfSweave 一起使用，但有一个糟糕的惊喜：我的大部分 ggplots 似乎不再工作了。例如，以下图在控制台和 Sweave 中完美运行：

pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point(aes(colour=que_id))
print(pl)

使用 pgfSweave 运行它，但是，我收到此错误：

Error in if (width > 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In if (width > 0) { :
  the condition has length > 1 and only the first element will be used
Error in driver$runcode(drobj, chunk, chunkopts) : 
  Error in if (width > 0) { : missing value where TRUE/FALSE needed

当我从 geom_point 中删除 aes(...) 时，该图与 pgfSweave 完美运行。

pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point()
print(pl)

编辑：我对这个问题进行了更多调查，并可以将问题减少到 tikz 设备。

这工作得很好：

quartz()
pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point(aes(colour=que_id))
print(pl)

这给出了上面的错误：

tikz( 'myPlot.tex',standAlone = T )
pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point(aes(colour=que_id))
print(pl)
dev.off()

这也工作得很好：

tikz( 'myPlot.tex',standAlone = T )
pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point()
print(pl)
dev.off()

我可以用 5 个不同的 ggplots 重复这个。当在映射中不使用颜色（或大小、alpha...）时，它可以与 tikz 一起使用。

Q1：有人对这种行为有任何解释吗？

此外，非绘图代码块的缓存效果不太好。以下代码块使用 Sweave 根本不需要花时间。使用 pgfSweave，大约需要 10 秒。

<<plot.opts,echo=FALSE,results=hide,cache=TRUE>>=
#colour and plot options are globally set
pal1 <- brewer.pal(8,"Set1")
pal_seq <- brewer.pal(8,"YlOrRd")
pal_seq <- c("steelblue1","tomato2")
opt1 <- opts(panel.grid.major = theme_line(colour = "white"),panel.grid.minor = theme_line(colour = "white"))
sca_fill_cont_opt <- scale_fill_continuous(low="steelblue1", high="tomato2")
ory <- geom_hline(yintercept=0,alpha=0.4,linetype=2) 
orx <- geom_vline(xintercept=0,alpha=0.4,linetype=2)
ts1 <- 2.3
ts2 <- 2.5
ts3 <- 2.8
ps1 <- 6
offset_x <- function(x,y) 0.15*x/pmax(abs(x),abs(y))
offset_y <- function(x,y) 0.05*y/pmax(abs(x),abs(y))
plot_size <- 50*50

这似乎也是一个非常奇怪的行为，因为只设置了一些变量供以后使用。

Q2：有人对此有任何解释吗？

Q3：更一般地说，我想问是否有人成功使用 pgfSweave？我所说的成功是指在 Sweave 中适用的所有功能在 pgfSweave 中也适用，并且还有漂亮的字体和更高的速度等额外好处。；）

非常感谢您的回复！

原文

I started using Sweave some time ago. However, like most people I encountered pretty soon a major problem: Speed. Sweaving a large document takes ages to run, which makes efficient working quite challenging. Data processing can be accelerated very much with cacheSweave. However, plots - especially ggplot ;) - still take too long to render.
That’s way I want to use pgfSweave.

After many, many hours, I finally succeeded in setting up a working system with Eclipse/StatET/Texlipse. I then wanted to convert an existing report to use with pgfSweave and had a bad surprise: most of my ggplots doesn’t seem to work anymore. The following plot for example works perfectly in the console and Sweave:

pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point(aes(colour=que_id))
print(pl)

Running it with pgfSweave, however, I get this error:

Error in if (width > 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In if (width > 0) { :
  the condition has length > 1 and only the first element will be used
Error in driver$runcode(drobj, chunk, chunkopts) : 
  Error in if (width > 0) { : missing value where TRUE/FALSE needed

When I remove aes(...) from geom_point, the plot works perfectly with pgfSweave.

pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point()
print(pl)

Edit:
I investigated more into the problem and could reduce the problem to the tikz-device.

This works just fine:

quartz()
pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point(aes(colour=que_id))
print(pl)

This gives the above error:

tikz( 'myPlot.tex',standAlone = T )
pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point(aes(colour=que_id))
print(pl)
dev.off()

This works just fine as well:

tikz( 'myPlot.tex',standAlone = T )
pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point()
print(pl)
dev.off()

I could repeat this with 5 different ggplots. When not using colour (or size, alpha,...) in the mapping, it works with tikz.

Q1: Does anybody has any explanations for this behavior?

Additionally, caching of non-plot code chunks doesn’t work very well. The following code chunk takes no time at all with Sweave. With pgfSweave, it takes approximately 10 sec.

<<plot.opts,echo=FALSE,results=hide,cache=TRUE>>=
#colour and plot options are globally set
pal1 <- brewer.pal(8,"Set1")
pal_seq <- brewer.pal(8,"YlOrRd")
pal_seq <- c("steelblue1","tomato2")
opt1 <- opts(panel.grid.major = theme_line(colour = "white"),panel.grid.minor = theme_line(colour = "white"))
sca_fill_cont_opt <- scale_fill_continuous(low="steelblue1", high="tomato2")
ory <- geom_hline(yintercept=0,alpha=0.4,linetype=2) 
orx <- geom_vline(xintercept=0,alpha=0.4,linetype=2)
ts1 <- 2.3
ts2 <- 2.5
ts3 <- 2.8
ps1 <- 6
offset_x <- function(x,y) 0.15*x/pmax(abs(x),abs(y))
offset_y <- function(x,y) 0.05*y/pmax(abs(x),abs(y))
plot_size <- 50*50

This seems a pretty strange behavior as well, as only some variables are set for later use.

Q2: Anybody got any explanations for that?

Q3: More generally, I would like to ask if anybody at all is using pgfSweave successfully?
With successfully I mean that all things that work in Sweave also work in pgfSweave, with the additional benefit of nice fonts and improved speed. ;)

Thanks very much for responses!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

芯好空 2024-10-10 01:11:02

问题1：有人对此行为有任何解释吗？

以下是 tikzDevice 在尝试构建绘图时出错的三个原因：

当您添加创建图例的美学映射时，例如 aes(colour=que_id)，ggplot2 将使用变量名称作为图例的标题——在本例中为 que_id。
tikzDevice 将所有字符串（例如图例标题）传递给 LaTeX 进行排版。
在 LaTeX 中，下划线字符 _ 用于表示下标。如果在数学模式之外使用下划线，则会导致错误。

当 tikzDevice 尝试计算图例标题“que_id”的高度和宽度时，它将字符串传递给 LaTeX 进行排版，并期望 LaTeX 返回字符串的宽度和高度。 LaTeX 会出现错误，因为数学模式之外的字符串中使用了未转义的下划线。 tikzDevice 收到字符串宽度的 NULL 而不是数字，这会导致 if (width > 0) 检查失败。

避免该问题的方法

通过添加色标指定要使用的图例标题：

p1 <- ggplot(plot_info, aes(海拔, 面积))
p1 <- p1 + geom_point(aes(color=que_id))


# 添加一个比变量名更容易让人阅读的名称
p1 <- p1 + scale_colour_brewer(name="Que ID")


# 或者，用适当的 LaTeX 转义序列替换下划线
p1 <- p1 + scale_colour_brewer(name="que\\textunderscore id")

使用 tikzDevice 0.5.0 中引入的字符串清理功能（但直到 0.5.2 才被破坏）。目前，字符串清理只会转义以下字符：%、$、{、} 和^。但是，您可以通过 tikzSanitizeCharacters 和 tikzReplacementCharacters 选项指定其他替换对：
```
# 将下划线添加到清理列表中
选项(tikzSanitizeCharacters = c('%','
```

我们将在接下来的几周内发布 tikzDevice 的 0.5.3 版本，以便解决由于 R 处理 system() 方式的变化，现在出现了一些烦人的警告消息。我将在下一个版本中添加以下更改：
当 width 为 NULL 时提供更好的警告消息，表明绘图文本可能存在问题。
将下划线和一些其他字符添加到字符串清理程序查找的默认字符集中。
希望这有帮助！
,'}','{','^','_')) 选项(tikzReplacementCharacters = c('\\%','\\我们将在接下来的几周内发布 tikzDevice 的 0.5.3 版本，以便解决由于 R 处理 system() 方式的变化，现在出现了一些烦人的警告消息。我将在下一个版本中添加以下更改：
当 width 为 NULL 时提供更好的警告消息，表明绘图文本可能存在问题。
将下划线和一些其他字符添加到字符串清理程序查找的默认字符集中。
希望这有帮助！
,'\\}','\\{', '\\^{}', '\\文本下划线')) # 启动绘图设备时打开字符串清理 tikz('myPlot.tex',standAlone = TRUE,sanitize = TRUE) 打印(p1) dev.off()

我们将在接下来的几周内发布 tikzDevice 的 0.5.3 版本，以便解决由于 R 处理 system() 方式的变化，现在出现了一些烦人的警告消息。我将在下一个版本中添加以下更改：

当 width 为 NULL 时提供更好的警告消息，表明绘图文本可能存在问题。
将下划线和一些其他字符添加到字符串清理程序查找的默认字符集中。

希望这有帮助！

Q1: Does anybody have any explanations for this behavior?

These are three reasons behind why tikzDevice gives an error when trying to construct your plot:

When you add an aesthetic mapping that creates a legend, such as aes(colour=que_id), ggplot2 will use the variable name as the title of the legend---in this case, que_id.
The tikzDevice passes all strings, such as legend titles, to LaTeX for typesetting.
In LaTeX the underscore character, _, is used to denote a subscript. If an underscore is used outside of math mode, it causes an error.

When the tikzDevice tries to calculate the height and width of the legend title, "que_id", it passes the string to LaTeX for typesetting and expects LaTeX to return the width and height of the string. LaTeX suffers an error because there is an unescaped underscore used in the string outside of mathmode. The tikzDevice receives a NULL for the string width instead of a number which causes an if (width > 0) check to fail.

Ways to avoid the problem

Specify a legend title to use by adding a color scale:

p1 <- ggplot(plot_info, aes(elevation, area))
p1 <- p1 + geom_point(aes(colour=que_id))


# Add a name that is easier for humans to read than the variable name
p1 <- p1 + scale_colour_brewer(name="Que ID")


# Or, replace the underscore with the appropriate LaTeX escape sequence
p1 <- p1 + scale_colour_brewer(name="que\\textunderscore id")

Use the string sanitization feature introduced in tikzDevice 0.5.0 (but was broken until 0.5.2). Currently, string sanitization will only escape the following characters: %, $, {, }, and ^ by default. However, you can specify additional substitution pairs via the tikzSanitizeCharacters and tikzReplacementCharacters options:
```
# Add underscores to the sanitization list
options(tikzSanitizeCharacters = c('%','
```

We will be publishing version 0.5.3 of the tikzDevice in the next couple of weeks in order to address some annoying warning messages that now show up due to changes in the way R handles system(). I will add the following changes to this next version:
Better warning message when width is NULL indicating that there is probably something wrong with plot text.
Add underscores and a few other characters to the default set of characters that the string sanitizer looks for.
Hope this helps!
,'}','{','^', '_')) options(tikzReplacementCharacters = c('\\%','\\We will be publishing version 0.5.3 of the tikzDevice in the next couple of weeks in order to address some annoying warning messages that now show up due to changes in the way R handles system(). I will add the following changes to this next version:
Better warning message when width is NULL indicating that there is probably something wrong with plot text.
Add underscores and a few other characters to the default set of characters that the string sanitizer looks for.
Hope this helps!
,'\\}','\\{', '\\^{}', '\\textunderscore')) # Turn on string sanitization when starting the plotting device tikz('myPlot.tex', standAlone = TRUE, sanitize = TRUE) print(p1) dev.off()

We will be publishing version 0.5.3 of the tikzDevice in the next couple of weeks in order to address some annoying warning messages that now show up due to changes in the way R handles system(). I will add the following changes to this next version:

Better warning message when width is NULL indicating that there is probably something wrong with plot text.
Add underscores and a few other characters to the default set of characters that the string sanitizer looks for.

Hope this helps!

回复收藏 0 原文

無處可尋 2024-10-10 01:11:02

Q2：我是 pgfsweave 的维护者。

以下是我运行的测试的结果：

time R CMD Sweave time-test.Rnw 

real    0m1.133s
user    0m1.068s
sys     0m0.054s

time R CMD pgfsweave time-test.Rnw 

real    0m2.941s
user    0m2.413s
sys     0m0.364s

time R CMD pgfsweave time-test.Rnw 

real    0m2.457s
user    0m2.112s
sys     0m0.283s

我相信造成时间差异的原因有两个，但需要更多的工作来准确验证它们：

pgfSweave 进行了大量的检查和双重检查，以确保它不会重做昂贵的操作计算。目标是使在文档中进行更昂贵的计算和绘图变得可行。在这种情况下，“昂贵”的规模远远超过额外的一两次检查。

作为缓存的示例，请考虑以下测试文件以了解缓存的真正好处：

\documentclass{article}

\begin{document}

<<plot.opts,cache=TRUE>>=
x <- Sys.sleep(10)
@

\end{document}

结果：

time R CMD Sweave time-test2.Rnw 

real    0m10.334s
user    0m0.283s
sys     0m0.047s

time R CMD pgfsweave time-test2.Rnw 

real    0m12.032s
user    0m1.356s
sys     0m0.349s

time R CMD pgfsweave time-test2.Rnw 

real    0m1.423s
user    0m1.121s
sys     0m0.266s

Sweave 在 R 2.12 中经历了一些更改。这些更改可能加快了代码块评估的过程，并让 pgfSweave 落后于这些较小的计算。值得研究

Q3：我自己一直在使用 pgfSweave 来完成我自己的工作。 R 2.12 中的 Sweave 发生了一些变化，导致 pgfSweave 出现一些小问题，但即将推出的新版本修复了所有问题。 github 上的开发版本 (https://github.com/cameronbracken/pgfSweave) 已经进行了更改。如果您遇到其他问题，我很乐意提供帮助。

Q2: I am the maintainer of pgfsweave.

Here are the results of a test I ran:

time R CMD Sweave time-test.Rnw 

real    0m1.133s
user    0m1.068s
sys     0m0.054s

time R CMD pgfsweave time-test.Rnw 

real    0m2.941s
user    0m2.413s
sys     0m0.364s

time R CMD pgfsweave time-test.Rnw 

real    0m2.457s
user    0m2.112s
sys     0m0.283s

I believe the there are 2 reasons for the time difference but it would take more work to verify them exactly:

pgfSweave does a ton of checking and double checking to make sure that it is not redoing expensive computations. The goal is to make it feasible to do more expensive calculations and plotting within a document. The scale of "expensive" in this case is much more than the additional second or two to do checks.

As an example of the caching consider the following test file to see the real benefits of caching:

\documentclass{article}

\begin{document}

<<plot.opts,cache=TRUE>>=
x <- Sys.sleep(10)
@

\end{document}

And the results:

time R CMD Sweave time-test2.Rnw 

real    0m10.334s
user    0m0.283s
sys     0m0.047s

time R CMD pgfsweave time-test2.Rnw 

real    0m12.032s
user    0m1.356s
sys     0m0.349s

time R CMD pgfsweave time-test2.Rnw 

real    0m1.423s
user    0m1.121s
sys     0m0.266s

Sweave has undergone some changes in R 2.12. The changes may have sped up the process of code chunk evaluation and left pgfSweave behind for these smaller calculations. Worth looking into

Q3: I use pgfSweave myself all the time for my own work. There have been some changes in Sweave in R 2.12 that have been causing some minor problems with pgfSweave but a new version is forthcoming that fixes everything. The development version on github ( https://github.com/cameronbracken/pgfSweave) already has the changes. If you are having additional problems I would be happy to help.

回复收藏 0 原文