ggplot2 条形图的子集 data.frame

发布于 2024-11-29 07:34:33 字数 1000 浏览 0 评论 0原文

我有以下数据:

    Splice.Pair  proportion
1         AA-AG 0.010909091
2         AA-GC 0.003636364
3         AA-TG 0.003636364
4         AA-TT 0.007272727
5         AC-AC 0.003636364
6         AC-AG 0.003636364
7         AC-GA 0.003636364
8         AC-GG 0.003636364
9         AC-TC 0.003636364
10        AC-TG 0.003636364
11        AC-TT 0.003636364
12        AG-AA 0.010909091
13        AG-AC 0.007272727
14        AG-AG 0.003636364
15        AG-AT 0.003636364
16        AG-CC 0.003636364
17        AG-CT 0.007272727
...       ...   ...

我想要获得一个条形图,直观地显示每个剪接对的比例,但仅限于比例超过 0.004 的剪接对。我尝试了以下操作:

nc.subset <- subset(nc.dat, proportion > 0.004)
qplot(Splice.Pair, proportion, data=nc.dat.subset,geom="bar", xlab="Splice Pair", ylab="Proportion of total non-canonical splice sites") + coord_flip();

但这只是给了我一个条形图,其中 Y 轴上包含所有拼接对,但被过滤掉的拼接对缺少条形。 在此处输入图像描述

我不知道发生了什么让所有类别仍然存在:s

I have the following data:

    Splice.Pair  proportion
1         AA-AG 0.010909091
2         AA-GC 0.003636364
3         AA-TG 0.003636364
4         AA-TT 0.007272727
5         AC-AC 0.003636364
6         AC-AG 0.003636364
7         AC-GA 0.003636364
8         AC-GG 0.003636364
9         AC-TC 0.003636364
10        AC-TG 0.003636364
11        AC-TT 0.003636364
12        AG-AA 0.010909091
13        AG-AC 0.007272727
14        AG-AG 0.003636364
15        AG-AT 0.003636364
16        AG-CC 0.003636364
17        AG-CT 0.007272727
...       ...   ...

I want to get a barchart visualising the proportion of each splice pair but only for splice pairs that have a proportion over, say, 0.004. I tried the following:

nc.subset <- subset(nc.dat, proportion > 0.004)
qplot(Splice.Pair, proportion, data=nc.dat.subset,geom="bar", xlab="Splice Pair", ylab="Proportion of total non-canonical splice sites") + coord_flip();

But this just gives me a bar chart with all splice pairs on the Y-axis, except that the splice pairs that were filtered out are missing bars.
enter image description here

I have no idea what is happening to allow all categories to still be present :s

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

这样的小城市 2024-12-06 07:34:33

发生的情况是 Splice.Pair 是一个因素。当您对数据框进行子集化时,该因子保留其级别属性,该属性仍然具有所有原始级别。您可以通过简单地将子集包装在 droplevels 中来避免此类问题:

nc.subset <- droplevels(subset(nc.dat, proportion > 0.004))

更一般地,如果您不喜欢这种使用因子自动保留级别的方式,您可以将 R 设置为将字符串存储为字符向量,而不是默认情况下,通过设置:

options(stringsAsFactors = FALSE)

在 R 会话开始时考虑因素(也可以将其作为选项传递给 data.frame)。

编辑

关于运行可能缺少droplevels的旧版本R的问题,@rcs在评论中指出,单个因素的方法很容易在您的计算机上实现自己的。数据帧的方法只是稍微复杂一些:

function (x, except = NULL, ...) 
{
    ix <- vapply(x, is.factor, NA)
    if (!is.null(except)) 
        ix[except] <- FALSE
    x[ix] <- lapply(x[ix], factor)
    x
}

但是当然,最好的解决方案仍然是升级到最新版本的R。

What's happening is that Splice.Pair is a factor. When you subset your data frame, the factor retains it's levels attribute, which still has all of the original levels. You can avoid this kind of problem by simply wrapping your subsetting in droplevels:

nc.subset <- droplevels(subset(nc.dat, proportion > 0.004))

More generally, if you dislike this kind of automatic retention of levels with factors, you can set R to store strings as character vectors rather than factors by default by setting:

options(stringsAsFactors = FALSE)

at the beginning of your R session (this can also be passed as an option to data.frame as well).

EDIT

Regarding the issue of running older versions of R that may lack droplevels, @rcs points out in a comment that the method for a single factor is very simple to implement on your own. The method for data frames is only slightly more complicated:

function (x, except = NULL, ...) 
{
    ix <- vapply(x, is.factor, NA)
    if (!is.null(except)) 
        ix[except] <- FALSE
    x[ix] <- lapply(x[ix], factor)
    x
}

But of course, the best solution is still to upgrade to the latest version of R.

乱了心跳 2024-12-06 07:34:33

检查 Splice.Pair 是否是一个因素。如果是这种情况,请使用 droplevels() 删除不再用于解决问题的级别。

nc.subset <- subset(nc.dat, proportion > 0.004)
nc.subset$Splice.Pair <- droplevels(nc.subset$Splice.Pair)
qplot(Splice.Pair, proportion, data=nc.dat.subset,geom="bar", xlab="Splice Pair", ylab="Proportion of total non-canonical splice sites") + coord_flip();

您也许可以将 droplevels 合并到 qlot 中,但那是为了让您找到您:-)

Check whether Splice.Pair is a factor. If that's the case, use droplevels() to remove the levels that are no longer used to resolve your problem.

nc.subset <- subset(nc.dat, proportion > 0.004)
nc.subset$Splice.Pair <- droplevels(nc.subset$Splice.Pair)
qplot(Splice.Pair, proportion, data=nc.dat.subset,geom="bar", xlab="Splice Pair", ylab="Proportion of total non-canonical splice sites") + coord_flip();

You may be able to incorporate droplevels into qlot, but that's for you to find you :-)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文