我可以避免在 ggplot2 中使用数据框吗?

发布于 2024-08-18 00:44:33 字数 556 浏览 7 评论 0原文

我正在运行蒙特卡罗模拟,输出的形式为:

> d = data.frame(iter=seq(1, 2), k1 = c(0.2, 0.6), k2=c(0.3, 0.4))
> d
iter  k1   k2
1     0.2  0.3
2     0.6  0.4

我想要生成的图是:

plot(d$iter, d$k1)
plot(density(d$k1))

我知道如何使用 ggplot2 绘制等效图,转换为数据框,

new_d = data.frame(iter=rep(d$iter, 2), 
                   k = c(d$k1, d$k2), 
                   label = rep(c('k1', 'k2'), each=2))

然后绘图很容易。然而,迭代次数可能非常大,并且 k 的数量也可能很大。这意味着要处理一个非常大的数据框。

无论如何我可以避免创建这个新的数据框吗?

谢谢

I'm running a monte-carlo simulation and the output is in the form:

> d = data.frame(iter=seq(1, 2), k1 = c(0.2, 0.6), k2=c(0.3, 0.4))
> d
iter  k1   k2
1     0.2  0.3
2     0.6  0.4

The plots I want to generate are:

plot(d$iter, d$k1)
plot(density(d$k1))

I know how to do equivalent plots using ggplot2, convert to data frame

new_d = data.frame(iter=rep(d$iter, 2), 
                   k = c(d$k1, d$k2), 
                   label = rep(c('k1', 'k2'), each=2))

then plotting is easy. However the number of iterations can be very large and the number of k's can also be large. This means messing about with a very large data frame.

Is there anyway I can avoid creating this new data frame?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

夜灵血窟げ 2024-08-25 00:44:33

简短的回答是“不”,您无法避免创建数据框。 ggplot 要求数据位于数据框中。如果您使用 qplot,您可以为 x 和 y 提供单独的向量,但在内部,它仍然根据您传入的参数创建一个数据框。

我同意 juba 的建议 - 学习使用reshape 函数,或者更好的是带有 melt/cast 函数的 reshape 包。一旦您能够快速将数据以长格式存储,那么创建令人惊叹的 ggplot 图表就更近了一步!

Short answer is "no," you can't avoid creating a data frame. ggplot requires the data to be in a data frame. If you use qplot, you can give it separate vectors for x and y, but internally, it's still creating a data frame out of the parameters you pass in.

I agree with juba's suggestion -- learn to use the reshape function, or better yet the reshape package with melt/cast functions. Once you get fast with putting your data in long format, creating amazing ggplot graphs becomes one step closer!

陪我终i 2024-08-25 00:44:33

是的,您可以避免创建数据框:只需为基础层 ggplot() 提供一个空参数列表即可。以下是基于您的代码的完整示例:

library(ggplot2)

d = data.frame(iter=seq(1, 2), k1 = c(0.2, 0.6), k2=c(0.3, 0.4))
# desired plots:
# plot(d$iter, d$k1)
# plot(density(d$k1))

ggplot() + geom_point(aes(x = d$iter, y = d$k1))
# there is not enough data for a good density plot,
# but this is how you would do it:
ggplot() + geom_density(aes(d$k1))

请注意,尽管这允许您不创建数据框,但仍可能在内部创建数据框。例如,请参阅以下 ?geom_point 摘录:

所有对象都将被强化以生成数据框。

Yes, it is possible for you to avoid creating a data frame: just give an empty argument list to the base layer, ggplot(). Here is a complete example based on your code:

library(ggplot2)

d = data.frame(iter=seq(1, 2), k1 = c(0.2, 0.6), k2=c(0.3, 0.4))
# desired plots:
# plot(d$iter, d$k1)
# plot(density(d$k1))

ggplot() + geom_point(aes(x = d$iter, y = d$k1))
# there is not enough data for a good density plot,
# but this is how you would do it:
ggplot() + geom_density(aes(d$k1))

Note that although this allows for you not to create a data frame, a data frame might still be created internally. See, e.g., the following extract from ?geom_point:

All objects will be fortified to produce a data frame.

红ご颜醉 2024-08-25 00:44:33

您可以使用reshape函数将数据框转换为“长”格式。也许它比你的代码快一点?

R> reshape(d, direction="long",varying=list(c("k1","k2")),v.names="k",times=c("k1","k2"))
     iter time   k id
1.k1    1   k1 0.2  1
2.k1    2   k1 0.6  2
1.k2    1   k2 0.3  1
2.k2    2   k2 0.4  2

You can use the reshape function to transform your data frame to "long" format. May be it is a bit faster than your code ?

R> reshape(d, direction="long",varying=list(c("k1","k2")),v.names="k",times=c("k1","k2"))
     iter time   k id
1.k1    1   k1 0.2  1
2.k1    2   k1 0.6  2
1.k2    1   k2 0.3  1
2.k2    2   k2 0.4  2
混吃等死 2024-08-25 00:44:33

所以只是补充一下之前的答案。使用 qplot,您可以这样做

p <- qplot(y=d$k2, x=d$k1)

,然后从那里进一步构建它,例如

p + theme_bw()

但我同意 - 熔化/铸造通常是前进的方向。

So just to add to the previous answers. With qplot you could do

p <- qplot(y=d$k2, x=d$k1)

and then from there building it further, e.g. with

p + theme_bw()

But I agree - melt/cast is genereally the way forward.

奈何桥上唱咆哮 2024-08-25 00:44:33

只需传递 NULL 作为数据框,并使用数据向量定义必要的美感。简单示例:

library(MASS)
library(tidyverse)
library(ranger)

rf <- ranger(medv ~ ., data = Boston, importance = "impurity")

rf$variable.importance

ggplot(NULL, aes(x = fct_reorder(names(rf$variable.importance), rf$variable.importance),
                 y = rf$variable.importance)) +
    geom_col(fill = "navy blue", alpha = 0.7) +
    coord_flip() +
    labs(x = "Predictor", y = "Importance", title = "Random Forest") +
    theme_bw()

“随机森林”"

Just pass NULL as the data frame, and define the necessary aesthetics using the data vectors. Quick example:

library(MASS)
library(tidyverse)
library(ranger)

rf <- ranger(medv ~ ., data = Boston, importance = "impurity")

rf$variable.importance

ggplot(NULL, aes(x = fct_reorder(names(rf$variable.importance), rf$variable.importance),
                 y = rf$variable.importance)) +
    geom_col(fill = "navy blue", alpha = 0.7) +
    coord_flip() +
    labs(x = "Predictor", y = "Importance", title = "Random Forest") +
    theme_bw()

Random Forest

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文