数据框中条件数据的箱线图

发布于 2024-08-26 02:21:05 字数 1128 浏览 4 评论 0原文

我是 R 新手,任何人都可以帮助我绘制数据集的箱线图,例如:

file1

     col1 col2     col3     col4  col5
050350005  101   56.625   48.318 RED    
051010002  106   50.625   46.990 GREEN    
051190007   25   65.875   74.545 BLUE    
051191002  246   52.875   57.070 RED    
220050004   55   70       80.274 BLUE    
220150008   75   67.750   62.749 RED    
220170001   77   65.750   54.307 GREEN

file2

     col1 col2     col3     col4  col5
050350005  101   56.625   57     RED
051010002  106   50.625   77     GREEN    
051190007   25   65.875   51.6   BLUE    
051191002  246   52.875   55.070 RED    
220050004   55   70       32     BLUE    
220150008   75   67.750   32.49  RED
220170001   77   65.750   84.07  GREEN

每种颜色(红色、绿色和蓝色),我需要通过使用 MB 和 RMSE 制作箱线图来比较 file1 和 file2 (col4-col3) 对于 file1 和 file2,将 col2 划分到不同的组中:

if col2<20,20<=col2<50, 50 <= col2 <70,col2>=70。

也就是说,对于箱线图,x 为 (<20, 20-50,50-70, >70),而 y 为 col4 和 < 之差的 MB(和 RMSE) code>col3

我希望我没有让任何人感到困惑。太感谢了。

I am new to R, can anyone help me with boxplot for a dataset like:

file1

     col1 col2     col3     col4  col5
050350005  101   56.625   48.318 RED    
051010002  106   50.625   46.990 GREEN    
051190007   25   65.875   74.545 BLUE    
051191002  246   52.875   57.070 RED    
220050004   55   70       80.274 BLUE    
220150008   75   67.750   62.749 RED    
220170001   77   65.750   54.307 GREEN

file2

     col1 col2     col3     col4  col5
050350005  101   56.625   57     RED
051010002  106   50.625   77     GREEN    
051190007   25   65.875   51.6   BLUE    
051191002  246   52.875   55.070 RED    
220050004   55   70       32     BLUE    
220150008   75   67.750   32.49  RED
220170001   77   65.750   84.07  GREEN

for each color (red,green and blue), I need to compare file1 and file2 by making box plot with MB and RMSE for (col4-col3) for file1 and file2 by dividing col2 in different group:

if col2<20,20<=col2<50, 50 <= col2 <70, col2 >=70.

That is, for the boxplot, the x is (<20, 20-50,50-70, >70), while y is MB (and RMSE) of the difference of col4 and col3

I hope I didn't confuse anybody. Thank you so much.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

三生一梦 2024-09-02 02:21:06

我认为对于箱线图的用途可能有些困惑。虽然可以在 x 轴上创建组,但据我所知,y 轴显示特定测量值的分布(在您的情况下,我假设是 col3 或 col4),而不是这些测量值的 RMSE 或 MBE,这对于每个组来说都是一个值。

我不确定您的分组变量(对于 x 轴)是否是 col5、您为 col2 列出的文件或标准,还是全部?无论如何,您需要更多数据才能使绘图有意义。

这是按 col5 和文件分组的 col3 箱线图的基本示例:

col3 = c(56.625, 50.625, 65.875, 52.875, 70, 67.750, 65.750, 56.625, 50.625, 65.875, 52.875, 70, 67.750, 65.750)
col5 = c("RED", "GREEN", "BLUE", "RED", "BLUE", "RED", "GREEN", "RED", "GREEN", "BLUE","RED","BLUE","RED","GREEN")
myfile = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2)
mydata = data.frame(col3, col5, myfile)
boxplot(col3 ~ col5 + myfile, data = mydata)

请注意,由于案例数量有限,因此您看不到某些类别的须线,也看不到异常值。您需要更多数据才能使该图发挥作用,现在它显示的只是中位数的比较。

你能澄清一下你希望情节展示什么吗?

I think there might be a bit of confusion about what a boxplot does/is. While it is possible to create groups on the x axis, as far as I know, the y axis shows the distribution of a certain measure (I assume either col3 or col4, in your case), not the RMSE or MBE of those measurements, which would be a single value for each group.

I am not sure if your grouping variable (for the x axis) is col5, the files or the criteria you list for col2, or all of them? Regardless, you would need more data for the plots to be meaningful.

This is a basic example of a boxplot of col3 grouped by col5 and file:

col3 = c(56.625, 50.625, 65.875, 52.875, 70, 67.750, 65.750, 56.625, 50.625, 65.875, 52.875, 70, 67.750, 65.750)
col5 = c("RED", "GREEN", "BLUE", "RED", "BLUE", "RED", "GREEN", "RED", "GREEN", "BLUE","RED","BLUE","RED","GREEN")
myfile = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2)
mydata = data.frame(col3, col5, myfile)
boxplot(col3 ~ col5 + myfile, data = mydata)

Note that because the number of cases is limited, you do not see the whiskers on some categories, nor the outliers. You would need more data for this plot to be useful, right now all it is showing is a comparison of medians.

Can you clarify what you were hoping the plot would show?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文