Rscript 和 Littler 之间的区别

发布于 2024-09-08 20:14:01 字数 1574 浏览 0 评论 0原文

...除了 Rscript 是通过 #!/usr/bin/env Rscript 和 littler< 调用的事实/em> 与脚本文件第一行中的 #!/usr/local/bin/r （在我的系统上）。我发现执行速度存在一定差异（似乎 littler 有点慢）。

我创建了两个虚拟脚本，每个脚本运行 1000 次并比较平均执行时间。

这是 Rscript 文件：

#!/usr/bin/env Rscript

btime <- proc.time()
x <- rnorm(100)
print(x)
print(plot(x))
etime <- proc.time()
tm <- etime - btime
sink(file = "rscript.r.out", append = TRUE)
cat(paste(tm[1:3], collapse = ";"), "\n")
sink()
print(tm)

这是较小的文件：

#!/usr/local/bin/r

btime <- proc.time()
x <- rnorm(100)
print(x)
print(plot(x))
etime <- proc.time()
tm <- etime - btime
sink(file = "little.r.out", append = TRUE)
cat(paste(tm[1:3], collapse = ";"), "\n")
sink()
print(tm)

如您所见，它们几乎相同（第一行和接收器文件参数不同）。输出被sink写入文本文件，因此通过read.table导入到R中。我创建了 bash 脚本来执行每个脚本 1000 次，然后计算平均值。

这是 bash 脚本：

for i in `seq 1000`
do
./$1
echo "####################"
echo "Iteration #$i"
echo "####################"
done

结果是：

# littler script
> mean(lit)
    user   system  elapsed 
0.489327 0.035458 0.588647 
> sapply(lit, median)
   L1    L2    L3 
0.490 0.036 0.609 
# Rscript
> mean(rsc)
    user   system  elapsed 
0.219334 0.008042 0.274017 
> sapply(rsc, median)
   R1    R2    R3 
0.220 0.007 0.258

长话短说：除了（明显的）执行时间差异之外，还有其他差异吗？更重要的问题是：为什么你应该/不应该更喜欢littler而不是Rscript（反之亦然）？

原文

...besides the fact that Rscript is invoked with #!/usr/bin/env Rscript and littler with #!/usr/local/bin/r (on my system) in first line of script file. I've found certain differences in execution speed (seems like littler is a bit slower).

I've created two dummy scripts, ran each 1000 times and compared average execution time.

Here's the Rscript file:

#!/usr/bin/env Rscript

btime <- proc.time()
x <- rnorm(100)
print(x)
print(plot(x))
etime <- proc.time()
tm <- etime - btime
sink(file = "rscript.r.out", append = TRUE)
cat(paste(tm[1:3], collapse = ";"), "\n")
sink()
print(tm)

and here's the littler file:

#!/usr/local/bin/r

btime <- proc.time()
x <- rnorm(100)
print(x)
print(plot(x))
etime <- proc.time()
tm <- etime - btime
sink(file = "little.r.out", append = TRUE)
cat(paste(tm[1:3], collapse = ";"), "\n")
sink()
print(tm)

As you can see, they are almost identical (first line and sink file argument differ). Output is sinked to text file, hence imported in R with read.table. I've created bash script to execute each script 1000 times, then calculated averages.

Here's bash script:

for i in `seq 1000`
do
./$1
echo "####################"
echo "Iteration #$i"
echo "####################"
done

And the results are:

# littler script
> mean(lit)
    user   system  elapsed 
0.489327 0.035458 0.588647 
> sapply(lit, median)
   L1    L2    L3 
0.490 0.036 0.609 
# Rscript
> mean(rsc)
    user   system  elapsed 
0.219334 0.008042 0.274017 
> sapply(rsc, median)
   R1    R2    R3 
0.220 0.007 0.258

Long story short: beside (obvious) execution-time difference, is there some other difference? More important question is: why should/shouldn't you prefer littler over Rscript (or vice versa)?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

探春 2024-09-15 20:14:01

几个快速评论：

路径 /usr/local/bin/r 是任意的，您可以使用 /usr/bin/env r ，就像我们在一些例子。我记得，它限制了您可以向 r 提供的其他参数，因为通过 env
调用时它只需要一个参数
我不明白您的基准测试，以及为什么你会那样做的。我们确实在源代码中进行了时序比较，请参阅tests/timing.sh和tests/timing2.sh。也许您想将测试分为启动和图形创建或任何您想要的东西。
每当我们进行这些测试时，“littler”都会获胜。（当我现在重新运行它们时，它仍然获胜。）这对我们来说很有意义，因为如果您查看 Rscript.exe 的源代码，它通过设置环境和命令字符串来工作不同在最终调用 execv(cmd, av) 之前。较小的人可以更快地开始。
主要的代价是便携性。由于构建的规模较小，因此无法进入 Windows。或者至少不容易。 OTOH，我们已经移植了 RInside，所以如果有人真的想...
Littler 在 2006 年 9 月首次推出，而 Rscript 则在 2007 年 4 月随 R 2.5.0 一起推出。
Rscript 现在无处不在。这是一个很大的优势。
在我看来，命令行选项对于较小的人来说更明智。
两者都与 CRAN 包 getopt 和 optparse 一起使用以进行选项解析。

所以这是个人喜好。我与他人共同编写了一些小东西，从中学到了很多东西（例如 RInside），并且仍然发现它很有用——所以我每天使用它数十次。它驱动蔓越莓。它驱动 cran2deb。正如他们所说，您的里程可能会有所不同。

免责声明：Littler 是我的项目之一。

后记：我会将测试编写为

  fun <- function { X <- rnorm(100); print(x); print(plot(x)) }
  replicate(N, system.time( fun )["elapsed"])

甚至

  mean( replicate(N, system.time(fun)["elapsed"]), trim=0.05)

消除异常值。此外，您本质上只测量 I/O（打印和绘图），两者都将从 R 库获得，因此我预计差异不大。

Couple quick comments:

The path /usr/local/bin/r is arbitrary, you can use /usr/bin/env r as well as we do in some examples. As I recall, it limits what other arguments you can give to r as it takes only one when invoked via env
I don't understand your benchmark, and why you'd do it that way. We do have timing comparisons in the sources, see tests/timing.sh and tests/timing2.sh. Maybe you want to split the test between startup and graph creation or whatever you are after.
Whenever we ran those tests, littler won. (It still won when I re-ran those right now.) Which made sense to us because if you look at the sources to Rscript.exe, it works different by setting up the environment and a command string before eventually calling execv(cmd, av). littler can start a little quicker.
The main price is portability. The way littler is built, it won't make it to Windows. Or at least not easily. OTOH we have RInside ported so if someone really wanted to...
Littler came first in September 2006 versus Rscript which came with R 2.5.0 in April 2007.
Rscript is now everywhere where R is. That is a big advantage.
Command-line options are a little more sensible for littler in my view.
Both work with CRAN packages getopt and optparse for option parsing.

So it's a personal preference. I co-wrote littler, learned a lot doing that (eg for RInside) and still find it useful -- so I use it dozens of times each day. It drives CRANberries. It drives cran2deb. Your mileage may, as they say, vary.

Disclaimer: littler is one of my projects.

Postscriptum: I would have written the test as

  fun <- function { X <- rnorm(100); print(x); print(plot(x)) }
  replicate(N, system.time( fun )["elapsed"])

or even

  mean( replicate(N, system.time(fun)["elapsed"]), trim=0.05)

to get rid of the outliers. Moreover, you only essentially measure I/O (a print, and a plot) which both will get from the R library so I would expect little difference.

回复收藏 0 原文

~没有更多了~