如何从数据框中创建组合箱线图?
我想做一些令人难以置信的简单的事情:我想为完整的数据框创建一个箱线图。然而,搜索“组合箱线图”和相关术语没有出现任何建议。如果我忽略了一个明显的方法,请告诉我。
我有以下数据:
> theData
X20.7 X21.7 X22.7 X23.7 X24.7 X25.7 X26.7 X27.7 X28.7 X29.7 X30.7 X31.7 X32.7 X33.7 X34.7 X35.7
1 99.64920 99.49319 99.49319 99.49319 99.49319 99.49319 99.80837 99.29348 99.29348 99.29348 99.29348 99.29348 99.29348 99.46376 99.46376 99.51554
2 98.76469 98.60867 98.60867 98.60867 98.60867 98.60867 99.41553 98.40896 98.40896 98.40896 98.40896 98.40896 98.40896 98.74975 98.74975 98.54527
3 98.37824 98.22222 98.22222 98.22222 98.22222 98.22222 98.70900 98.13767 98.13767 98.13767 98.13767 98.13767 98.13767 98.47846 98.47846 98.01791
4 98.11356 97.95754 97.95754 97.95754 97.95754 97.95754 97.82447 97.93003 97.93003 97.93003 97.93003 97.93003 97.93003 98.27083 98.27083 97.81027
5 97.80027 97.64424 97.64424 97.64424 97.64424 97.48632 97.43801 97.40158 97.40158 97.40158 97.40158 97.40158 97.40158 97.74239 97.74239 97.28181
6 97.47825 97.32222 97.32222 97.32222 97.43795 97.12131 97.17333 97.03658 97.10158 97.10158 97.10158 97.10158 97.10158 97.44239 97.44239 96.98180
> dput(theData)
structure(list(X20.7 = c(99.6492, 98.7646913866934, 98.3782376564915,
98.1135635544627, 97.8002672890352, 97.4782549804011), X21.7 = c(99.4931928571429,
98.6086741582754, 98.2222160140822, 97.9575388921788, 97.6442390541023,
97.3222230681959), X22.7 = c(99.4931928571429, 98.6086741582754,
98.2222160140822, 97.9575388921788, 97.6442390541023, 97.3222230681959
), X23.7 = c(99.4931928571429, 98.6086741582754, 98.2222160140822,
97.9575388921788, 97.6442390541023, 97.3222230681959), X24.7 = c(99.4931928571429,
98.6086741582754, 98.2222160140822, 97.9575388921788, 97.6442390541023,
97.437947563131), X25.7 = c(99.4931928571429, 98.6086741582754,
98.2222160140822, 97.9575388921788, 97.4863155584865, 97.121313307238
), X26.7 = c(99.8083714285714, 99.415530164398, 98.7090041774867,
97.8244717838903, 97.4380076185552, 97.173326388931), X27.7 = c(99.2934828571429,
98.4089615689001, 98.1376722694449, 97.9300324124538, 97.401583100132,
97.03657716757), X28.7 = c(99.2934828571429, 98.4089615689001,
98.1376722694449, 97.9300324124538, 97.401583100132, 97.1015782240536
), X29.7 = c(99.2934828571429, 98.4089615689001, 98.1376722694449,
97.9300324124538, 97.401583100132, 97.1015782240536), X30.7 = c(99.2934828571429,
98.4089615689001, 98.1376722694449, 97.9300324124538, 97.401583100132,
97.1015782240536), X31.7 = c(99.2934828571429, 98.4089615689001,
98.1376722694449, 97.9300324124538, 97.401583100132, 97.1015782240536
), X32.7 = c(99.2934828571429, 98.4089615689001, 98.1376722694449,
97.9300324124538, 97.401583100132, 97.1015782240536), X33.7 = c(99.4637585714286,
98.7497473555799, 98.478463763926, 98.2708282766442, 97.7423900760775,
97.4423915096353), X34.7 = c(99.4637585714286, 98.7497473555799,
98.478463763926, 98.2708282766442, 97.7423900760775, 97.4423915096353
), X35.7 = c(99.5155421428571, 98.5452656069643, 98.0179127183643,
97.81026932055, 97.2818110000344, 96.9818010094329)), .Names = c("X20.7",
"X21.7", "X22.7", "X23.7", "X24.7", "X25.7", "X26.7", "X27.7",
"X28.7", "X29.7", "X30.7", "X31.7", "X32.7", "X33.7", "X34.7",
"X35.7"), row.names = c(NA, 6L), class = "data.frame")
我希望将所有这些数据汇总在一个箱线图中,但是,当我尝试绘制箱线图(即 boxplot(theData)
)时,R 会自动根据列名称进行分组。
我还尝试将完整的数据帧放入向量中,但是,因为我的(完整)数据集也包含 NA 值,所以我没有成功。到目前为止,我有以下函数来尝试制作数据帧的向量,以便可以将其绘制在箱线图中:
for(i in 1:ncol(allTheData)) {
tmpData <- allTheData[,i]
for(j in 1:length(tmpData)){
if(!is.na(j)){
tmpVector <- c(tmpVector, j)
}
}
}
但是,我认为我使这个问题过于复杂,并且我怀疑这样的循环构造是否会受益 ?
那么,如何为完整数据框制作一个由一个箱线图组成的箱线图呢 那么,我没有得到由 X20.7 到 X35.7 组成的箱线图,但给出了一个“总体”箱线图?
I want to do something incredible simple: I want to create one boxplot for an complete dataframe. Yet, searching for ‘combined boxplot’ and related terms didn’t turn up any suggestions. If I overlooked an obvious way, let me know.
I have the following data:
> theData
X20.7 X21.7 X22.7 X23.7 X24.7 X25.7 X26.7 X27.7 X28.7 X29.7 X30.7 X31.7 X32.7 X33.7 X34.7 X35.7
1 99.64920 99.49319 99.49319 99.49319 99.49319 99.49319 99.80837 99.29348 99.29348 99.29348 99.29348 99.29348 99.29348 99.46376 99.46376 99.51554
2 98.76469 98.60867 98.60867 98.60867 98.60867 98.60867 99.41553 98.40896 98.40896 98.40896 98.40896 98.40896 98.40896 98.74975 98.74975 98.54527
3 98.37824 98.22222 98.22222 98.22222 98.22222 98.22222 98.70900 98.13767 98.13767 98.13767 98.13767 98.13767 98.13767 98.47846 98.47846 98.01791
4 98.11356 97.95754 97.95754 97.95754 97.95754 97.95754 97.82447 97.93003 97.93003 97.93003 97.93003 97.93003 97.93003 98.27083 98.27083 97.81027
5 97.80027 97.64424 97.64424 97.64424 97.64424 97.48632 97.43801 97.40158 97.40158 97.40158 97.40158 97.40158 97.40158 97.74239 97.74239 97.28181
6 97.47825 97.32222 97.32222 97.32222 97.43795 97.12131 97.17333 97.03658 97.10158 97.10158 97.10158 97.10158 97.10158 97.44239 97.44239 96.98180
> dput(theData)
structure(list(X20.7 = c(99.6492, 98.7646913866934, 98.3782376564915,
98.1135635544627, 97.8002672890352, 97.4782549804011), X21.7 = c(99.4931928571429,
98.6086741582754, 98.2222160140822, 97.9575388921788, 97.6442390541023,
97.3222230681959), X22.7 = c(99.4931928571429, 98.6086741582754,
98.2222160140822, 97.9575388921788, 97.6442390541023, 97.3222230681959
), X23.7 = c(99.4931928571429, 98.6086741582754, 98.2222160140822,
97.9575388921788, 97.6442390541023, 97.3222230681959), X24.7 = c(99.4931928571429,
98.6086741582754, 98.2222160140822, 97.9575388921788, 97.6442390541023,
97.437947563131), X25.7 = c(99.4931928571429, 98.6086741582754,
98.2222160140822, 97.9575388921788, 97.4863155584865, 97.121313307238
), X26.7 = c(99.8083714285714, 99.415530164398, 98.7090041774867,
97.8244717838903, 97.4380076185552, 97.173326388931), X27.7 = c(99.2934828571429,
98.4089615689001, 98.1376722694449, 97.9300324124538, 97.401583100132,
97.03657716757), X28.7 = c(99.2934828571429, 98.4089615689001,
98.1376722694449, 97.9300324124538, 97.401583100132, 97.1015782240536
), X29.7 = c(99.2934828571429, 98.4089615689001, 98.1376722694449,
97.9300324124538, 97.401583100132, 97.1015782240536), X30.7 = c(99.2934828571429,
98.4089615689001, 98.1376722694449, 97.9300324124538, 97.401583100132,
97.1015782240536), X31.7 = c(99.2934828571429, 98.4089615689001,
98.1376722694449, 97.9300324124538, 97.401583100132, 97.1015782240536
), X32.7 = c(99.2934828571429, 98.4089615689001, 98.1376722694449,
97.9300324124538, 97.401583100132, 97.1015782240536), X33.7 = c(99.4637585714286,
98.7497473555799, 98.478463763926, 98.2708282766442, 97.7423900760775,
97.4423915096353), X34.7 = c(99.4637585714286, 98.7497473555799,
98.478463763926, 98.2708282766442, 97.7423900760775, 97.4423915096353
), X35.7 = c(99.5155421428571, 98.5452656069643, 98.0179127183643,
97.81026932055, 97.2818110000344, 96.9818010094329)), .Names = c("X20.7",
"X21.7", "X22.7", "X23.7", "X24.7", "X25.7", "X26.7", "X27.7",
"X28.7", "X29.7", "X30.7", "X31.7", "X32.7", "X33.7", "X34.7",
"X35.7"), row.names = c(NA, 6L), class = "data.frame")
I want all this data summarized in one boxplot, yet, when I try to plot an boxplot (i.e. boxplot(theData)
) R automatically makes groups based on the column names.
I also tried to put the complete data frame in an vector, however, because my (complete) data set also contains NA values, I didn’t succeed in this. So far, I have the following function to try to make an vector of the dataframe so that this can be plotted in a boxplot:
for(i in 1:ncol(allTheData)) {
tmpData <- allTheData[,i]
for(j in 1:length(tmpData)){
if(!is.na(j)){
tmpVector <- c(tmpVector, j)
}
}
}
However, I think I’m overcomplicating this problem, and I’m doubtful if such an loop construction will benefit the performance of R.
So, how can I make an boxplot which consists of one boxplot for an complete data frame? So, that I don't get an boxplot which consists of X20.7 through X35.7, but gives one "Overall" boxplot?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试这样的事情
Try something like this
Jura,
使用
reshape
中的melt
函数将数据转换为“长”格式,然后使用boxplot
怎么样?假设您的数据位于名为 df 的对象中:Jura,
How about using the
melt
function inreshape
to convert your data to "long" format and then useboxplot
on that? Assuming your data is in an object nameddf
: