数据框中条件数据的箱线图
我是 R 新手,任何人都可以帮助我绘制数据集的箱线图,例如:
file1
col1 col2 col3 col4 col5
050350005 101 56.625 48.318 RED
051010002 106 50.625 46.990 GREEN
051190007 25 65.875 74.545 BLUE
051191002 246 52.875 57.070 RED
220050004 55 70 80.274 BLUE
220150008 75 67.750 62.749 RED
220170001 77 65.750 54.307 GREEN
file2
col1 col2 col3 col4 col5
050350005 101 56.625 57 RED
051010002 106 50.625 77 GREEN
051190007 25 65.875 51.6 BLUE
051191002 246 52.875 55.070 RED
220050004 55 70 32 BLUE
220150008 75 67.750 32.49 RED
220170001 77 65.750 84.07 GREEN
每种颜色(红色、绿色和蓝色),我需要通过使用 MB 和 RMSE 制作箱线图来比较 file1 和 file2 (col4
-col3
) 对于 file1 和 file2,将 col2
划分到不同的组中:
if col2<20,20<=col2<50, 50 <= col2 <70,col2>=70。
也就是说,对于箱线图,x 为 (<20, 20-50,50-70, >70),而 y 为 col4
和 < 之差的 MB(和 RMSE) code>col3
我希望我没有让任何人感到困惑。太感谢了。
I am new to R, can anyone help me with boxplot for a dataset like:
file1
col1 col2 col3 col4 col5
050350005 101 56.625 48.318 RED
051010002 106 50.625 46.990 GREEN
051190007 25 65.875 74.545 BLUE
051191002 246 52.875 57.070 RED
220050004 55 70 80.274 BLUE
220150008 75 67.750 62.749 RED
220170001 77 65.750 54.307 GREEN
file2
col1 col2 col3 col4 col5
050350005 101 56.625 57 RED
051010002 106 50.625 77 GREEN
051190007 25 65.875 51.6 BLUE
051191002 246 52.875 55.070 RED
220050004 55 70 32 BLUE
220150008 75 67.750 32.49 RED
220170001 77 65.750 84.07 GREEN
for each color (red,green and blue), I need to compare file1 and file2 by making box plot with MB and RMSE for (col4
-col3
) for file1 and file2 by dividing col2
in different group:
if col2<20,20<=col2<50, 50 <= col2 <70, col2 >=70.
That is, for the boxplot, the x is (<20, 20-50,50-70, >70), while y is MB (and RMSE) of the difference of col4
and col3
I hope I didn't confuse anybody. Thank you so much.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为对于箱线图的用途可能有些困惑。虽然可以在 x 轴上创建组,但据我所知,y 轴显示特定测量值的分布(在您的情况下,我假设是 col3 或 col4),而不是这些测量值的 RMSE 或 MBE,这对于每个组来说都是一个值。
我不确定您的分组变量(对于 x 轴)是否是 col5、您为 col2 列出的文件或标准,还是全部?无论如何,您需要更多数据才能使绘图有意义。
这是按 col5 和文件分组的 col3 箱线图的基本示例:
请注意,由于案例数量有限,因此您看不到某些类别的须线,也看不到异常值。您需要更多数据才能使该图发挥作用,现在它显示的只是中位数的比较。
你能澄清一下你希望情节展示什么吗?
I think there might be a bit of confusion about what a boxplot does/is. While it is possible to create groups on the x axis, as far as I know, the y axis shows the distribution of a certain measure (I assume either col3 or col4, in your case), not the RMSE or MBE of those measurements, which would be a single value for each group.
I am not sure if your grouping variable (for the x axis) is col5, the files or the criteria you list for col2, or all of them? Regardless, you would need more data for the plots to be meaningful.
This is a basic example of a boxplot of col3 grouped by col5 and file:
Note that because the number of cases is limited, you do not see the whiskers on some categories, nor the outliers. You would need more data for this plot to be useful, right now all it is showing is a comparison of medians.
Can you clarify what you were hoping the plot would show?