如何从数据框中创建组合箱线图?

发布于 2024-10-17 18:25:14 字数 3909 浏览 4 评论 0原文

我想做一些令人难以置信的简单的事情:我想为完整的数据框创建一个箱线图。然而,搜索“组合箱线图”和相关术语没有出现任何建议。如果我忽略了一个明显的方法,请告诉我。

我有以下数据:

> theData
     X20.7    X21.7    X22.7    X23.7    X24.7    X25.7    X26.7    X27.7    X28.7    X29.7    X30.7    X31.7    X32.7    X33.7    X34.7    X35.7
1 99.64920 99.49319 99.49319 99.49319 99.49319 99.49319 99.80837 99.29348 99.29348 99.29348 99.29348 99.29348 99.29348 99.46376 99.46376 99.51554
2 98.76469 98.60867 98.60867 98.60867 98.60867 98.60867 99.41553 98.40896 98.40896 98.40896 98.40896 98.40896 98.40896 98.74975 98.74975 98.54527
3 98.37824 98.22222 98.22222 98.22222 98.22222 98.22222 98.70900 98.13767 98.13767 98.13767 98.13767 98.13767 98.13767 98.47846 98.47846 98.01791
4 98.11356 97.95754 97.95754 97.95754 97.95754 97.95754 97.82447 97.93003 97.93003 97.93003 97.93003 97.93003 97.93003 98.27083 98.27083 97.81027
5 97.80027 97.64424 97.64424 97.64424 97.64424 97.48632 97.43801 97.40158 97.40158 97.40158 97.40158 97.40158 97.40158 97.74239 97.74239 97.28181
6 97.47825 97.32222 97.32222 97.32222 97.43795 97.12131 97.17333 97.03658 97.10158 97.10158 97.10158 97.10158 97.10158 97.44239 97.44239 96.98180
> dput(theData)
structure(list(X20.7 = c(99.6492, 98.7646913866934, 98.3782376564915, 
98.1135635544627, 97.8002672890352, 97.4782549804011), X21.7 = c(99.4931928571429, 
98.6086741582754, 98.2222160140822, 97.9575388921788, 97.6442390541023, 
97.3222230681959), X22.7 = c(99.4931928571429, 98.6086741582754, 
98.2222160140822, 97.9575388921788, 97.6442390541023, 97.3222230681959
), X23.7 = c(99.4931928571429, 98.6086741582754, 98.2222160140822, 
97.9575388921788, 97.6442390541023, 97.3222230681959), X24.7 = c(99.4931928571429, 
98.6086741582754, 98.2222160140822, 97.9575388921788, 97.6442390541023, 
97.437947563131), X25.7 = c(99.4931928571429, 98.6086741582754, 
98.2222160140822, 97.9575388921788, 97.4863155584865, 97.121313307238
), X26.7 = c(99.8083714285714, 99.415530164398, 98.7090041774867, 
97.8244717838903, 97.4380076185552, 97.173326388931), X27.7 = c(99.2934828571429, 
98.4089615689001, 98.1376722694449, 97.9300324124538, 97.401583100132, 
97.03657716757), X28.7 = c(99.2934828571429, 98.4089615689001, 
98.1376722694449, 97.9300324124538, 97.401583100132, 97.1015782240536
), X29.7 = c(99.2934828571429, 98.4089615689001, 98.1376722694449, 
97.9300324124538, 97.401583100132, 97.1015782240536), X30.7 = c(99.2934828571429, 
98.4089615689001, 98.1376722694449, 97.9300324124538, 97.401583100132, 
97.1015782240536), X31.7 = c(99.2934828571429, 98.4089615689001, 
98.1376722694449, 97.9300324124538, 97.401583100132, 97.1015782240536
), X32.7 = c(99.2934828571429, 98.4089615689001, 98.1376722694449, 
97.9300324124538, 97.401583100132, 97.1015782240536), X33.7 = c(99.4637585714286, 
98.7497473555799, 98.478463763926, 98.2708282766442, 97.7423900760775, 
97.4423915096353), X34.7 = c(99.4637585714286, 98.7497473555799, 
98.478463763926, 98.2708282766442, 97.7423900760775, 97.4423915096353
), X35.7 = c(99.5155421428571, 98.5452656069643, 98.0179127183643, 
97.81026932055, 97.2818110000344, 96.9818010094329)), .Names = c("X20.7", 
"X21.7", "X22.7", "X23.7", "X24.7", "X25.7", "X26.7", "X27.7", 
"X28.7", "X29.7", "X30.7", "X31.7", "X32.7", "X33.7", "X34.7", 
"X35.7"), row.names = c(NA, 6L), class = "data.frame")

我希望将所有这些数据汇总在一个箱线图中,但是,当我尝试绘制箱线图(即 boxplot(theData))时,R 会自动根据列名称进行分组。

我还尝试将完整的数据帧放入向量中,但是,因为我的(完整)数据集也包含 NA 值,所以我没有成功。到目前为止,我有以下函数来尝试制作数据帧的向量,以便可以将其绘制在箱线图中:

for(i in 1:ncol(allTheData)) {
        tmpData <- allTheData[,i]
        for(j in 1:length(tmpData)){
            if(!is.na(j)){
                tmpVector <- c(tmpVector, j)
            }
        }
    }

但是,我认为我使这个问题过于复杂,并且我怀疑这样的循环构造是否会受益 ?

那么,如何为完整数据框制作一个由一个箱线图组成的箱线图呢 那么,我没有得到由 X20.7 到 X35.7 组成的箱线图,但给出了一个“总体”箱线图?

I want to do something incredible simple: I want to create one boxplot for an complete dataframe. Yet, searching for ‘combined boxplot’ and related terms didn’t turn up any suggestions. If I overlooked an obvious way, let me know.

I have the following data:

> theData
     X20.7    X21.7    X22.7    X23.7    X24.7    X25.7    X26.7    X27.7    X28.7    X29.7    X30.7    X31.7    X32.7    X33.7    X34.7    X35.7
1 99.64920 99.49319 99.49319 99.49319 99.49319 99.49319 99.80837 99.29348 99.29348 99.29348 99.29348 99.29348 99.29348 99.46376 99.46376 99.51554
2 98.76469 98.60867 98.60867 98.60867 98.60867 98.60867 99.41553 98.40896 98.40896 98.40896 98.40896 98.40896 98.40896 98.74975 98.74975 98.54527
3 98.37824 98.22222 98.22222 98.22222 98.22222 98.22222 98.70900 98.13767 98.13767 98.13767 98.13767 98.13767 98.13767 98.47846 98.47846 98.01791
4 98.11356 97.95754 97.95754 97.95754 97.95754 97.95754 97.82447 97.93003 97.93003 97.93003 97.93003 97.93003 97.93003 98.27083 98.27083 97.81027
5 97.80027 97.64424 97.64424 97.64424 97.64424 97.48632 97.43801 97.40158 97.40158 97.40158 97.40158 97.40158 97.40158 97.74239 97.74239 97.28181
6 97.47825 97.32222 97.32222 97.32222 97.43795 97.12131 97.17333 97.03658 97.10158 97.10158 97.10158 97.10158 97.10158 97.44239 97.44239 96.98180
> dput(theData)
structure(list(X20.7 = c(99.6492, 98.7646913866934, 98.3782376564915, 
98.1135635544627, 97.8002672890352, 97.4782549804011), X21.7 = c(99.4931928571429, 
98.6086741582754, 98.2222160140822, 97.9575388921788, 97.6442390541023, 
97.3222230681959), X22.7 = c(99.4931928571429, 98.6086741582754, 
98.2222160140822, 97.9575388921788, 97.6442390541023, 97.3222230681959
), X23.7 = c(99.4931928571429, 98.6086741582754, 98.2222160140822, 
97.9575388921788, 97.6442390541023, 97.3222230681959), X24.7 = c(99.4931928571429, 
98.6086741582754, 98.2222160140822, 97.9575388921788, 97.6442390541023, 
97.437947563131), X25.7 = c(99.4931928571429, 98.6086741582754, 
98.2222160140822, 97.9575388921788, 97.4863155584865, 97.121313307238
), X26.7 = c(99.8083714285714, 99.415530164398, 98.7090041774867, 
97.8244717838903, 97.4380076185552, 97.173326388931), X27.7 = c(99.2934828571429, 
98.4089615689001, 98.1376722694449, 97.9300324124538, 97.401583100132, 
97.03657716757), X28.7 = c(99.2934828571429, 98.4089615689001, 
98.1376722694449, 97.9300324124538, 97.401583100132, 97.1015782240536
), X29.7 = c(99.2934828571429, 98.4089615689001, 98.1376722694449, 
97.9300324124538, 97.401583100132, 97.1015782240536), X30.7 = c(99.2934828571429, 
98.4089615689001, 98.1376722694449, 97.9300324124538, 97.401583100132, 
97.1015782240536), X31.7 = c(99.2934828571429, 98.4089615689001, 
98.1376722694449, 97.9300324124538, 97.401583100132, 97.1015782240536
), X32.7 = c(99.2934828571429, 98.4089615689001, 98.1376722694449, 
97.9300324124538, 97.401583100132, 97.1015782240536), X33.7 = c(99.4637585714286, 
98.7497473555799, 98.478463763926, 98.2708282766442, 97.7423900760775, 
97.4423915096353), X34.7 = c(99.4637585714286, 98.7497473555799, 
98.478463763926, 98.2708282766442, 97.7423900760775, 97.4423915096353
), X35.7 = c(99.5155421428571, 98.5452656069643, 98.0179127183643, 
97.81026932055, 97.2818110000344, 96.9818010094329)), .Names = c("X20.7", 
"X21.7", "X22.7", "X23.7", "X24.7", "X25.7", "X26.7", "X27.7", 
"X28.7", "X29.7", "X30.7", "X31.7", "X32.7", "X33.7", "X34.7", 
"X35.7"), row.names = c(NA, 6L), class = "data.frame")

I want all this data summarized in one boxplot, yet, when I try to plot an boxplot (i.e. boxplot(theData)) R automatically makes groups based on the column names.

I also tried to put the complete data frame in an vector, however, because my (complete) data set also contains NA values, I didn’t succeed in this. So far, I have the following function to try to make an vector of the dataframe so that this can be plotted in a boxplot:

for(i in 1:ncol(allTheData)) {
        tmpData <- allTheData[,i]
        for(j in 1:length(tmpData)){
            if(!is.na(j)){
                tmpVector <- c(tmpVector, j)
            }
        }
    }

However, I think I’m overcomplicating this problem, and I’m doubtful if such an loop construction will benefit the performance of R.

So, how can I make an boxplot which consists of one boxplot for an complete data frame? So, that I don't get an boxplot which consists of X20.7 through X35.7, but gives one "Overall" boxplot?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

末骤雨初歇 2024-10-24 18:25:14

尝试这样的事情

boxplot(unlist(theData))

Try something like this

boxplot(unlist(theData))
眼角的笑意。 2024-10-24 18:25:14

Jura,

使用 reshape 中的 melt 函数将数据转换为“长”格式,然后使用 boxplot 怎么样?假设您的数据位于名为 df 的对象中:

> library(reshape)
> df.m <- melt(df)
Using  as id variables
> head(df.m)
  variable    value
1    X20.7 99.64920
2    X20.7 98.76469
3    X20.7 98.37824
4    X20.7 98.11356
5    X20.7 97.80027
6    X20.7 97.47825
> 
> boxplot(df.m$value)

Jura,

How about using the melt function in reshape to convert your data to "long" format and then use boxplot on that? Assuming your data is in an object named df:

> library(reshape)
> df.m <- melt(df)
Using  as id variables
> head(df.m)
  variable    value
1    X20.7 99.64920
2    X20.7 98.76469
3    X20.7 98.37824
4    X20.7 98.11356
5    X20.7 97.80027
6    X20.7 97.47825
> 
> boxplot(df.m$value)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文