如何在 gnuplot 生成的 cdf 上绘制引导线?
在工作中,我有一组浮点值,我对其进行排序和计算 CDF,并在 gnuplot 中进行绘制。我想画一条线来显示 CDF 的 80% 和 90% 阈值在哪里,即一条从左侧 @ 0.8 y 刻度线进入的线,接触图形,然后下降到该值可能是什么。这是为了帮助引导观众的眼睛。
数据是自动生成的,我制作了多个图,所以我不想每次都手工制作这些线。
在 0.8 和 0.9 y 值点处绘制完全穿过绘图的水平箭头很简单,但我不明白如何确定应在何处绘制垂直线。 这是aq/a wrt绘图箭头: Gnuplot:特定位置的垂直线,但位置是先验已知的。
这是一些示例数据(我的工作机器无法访问互联网,因此共享很困难)
X Y
5.0 | 0.143
8.0 | 0.288
16.0 | 0.429
25.0 | 0.714
39.0 | 0.857
47.0 | 1.000
有什么想法吗?
At work have a set of floating point values that I sort and compute a CDF for and plot within gnuplot. I'd like to draw a line showing where the 80% and 90% thresholds of the CDF are, i.e. a line coming in from the left @ the 0.8 y tic mark, touching the graph and then dropping down to whatever that value might be. This is to help guide the viewers eye.
The data is generated automatically and I make multiple plots so I don't want to have to hand craft these lines each time.
It's trivial to draw a horizontal arrow going completely across the plot at the 0.8 and 0.9 y-value points, but I don't understand how to determine where the vertical line should be drawn.
Here is a q/a wrt drawing arrows: Gnuplot: Vertical lines at specific positions, but the positions are known a priori.
Here is some sample data (my work machine is not internet accessible so sharing is hard)
X Y
5.0 | 0.143
8.0 | 0.288
16.0 | 0.429
25.0 | 0.714
39.0 | 0.857
47.0 | 1.000
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是我的看法(使用百分位数排名),仅假设单变量系列测量可用(您的列标题为
X
)。您可能需要稍微调整它以使用预先计算的累积频率,但这并不困难。这会产生以下输出:
当然,您可以根据需要添加任意数量的百分位数;您只需定义一个新变量,例如
perc90
,并请求另外两个arrow
命令,并替换所有出现的0.8
(啊...神奇数字的乐趣!)通过所需的数字(在本例中为 0.9)。关于上述代码的一些解释:
table
生成的表头(前四行); (我们可以要求 awk 从第 5 行开始,但我们就这样吧。)trunc(rank(x))/length(x)
这样的函数来获取百分位数排名。)如果您想尝试一下 R,您可以安全地替换一系列长串的 sed/awk 命令以及对 R 的调用,就像
假设 rnd.dat 位于您的主目录中一样。
旁注:如果您可以不用 gnuplot,这里有一些 R 命令可以完成此类图形(即使不使用
quantile
函数):Here is my take (using percentile ranks), which only assumes a univariate series of measurement is available (your column headed
X
). You may want to tweak it a little to work with your pre-computed cumulative frequencies, but that's not really difficult.This yields the following output:
You can add as many percentile values as you want, of course; you just have to define a new variable, e.g.
perc90
, as well as ask for two otherarrow
commands, and replace every occurrence of0.8
(ah... the joy of magic numbers!) by the desired one (in this case, 0.9).Some explanations about the above code:
table
(first four lines); (we could ask awk to start at the 5th lines, but let's go with that.)trunc(rank(x))/length(x)
to get the percentile ranks.)If you want to give R a shot, you can safely replace that long series of sed/awk commands with a call to R like
assuming
rnd.dat
is in your home directory.Sidenote: And if you can live without gnuplot, here are some R commands to do that kind of graphics (even not using the
quantile
function):您可以使用 awk 来计算给定值的行。
示例
如果您有一个如下所示的数据文件
Data.csv
:则可以使用 Now 绘制它
如果您想在第二列最大值的 90%(本例中为 90)处绘制一条线, 运行 awk 脚本。其目的是确定最小和最大 x 值以及最大 y 值的 90% 值。它可能看起来像这样:
基本上它的作用如下:
检查
x_min
是否存在以及是否未设置x_min
、x_max 和
y_max
到Data.csv
的第一列或第二列。检查当前第一列是否大于当前
x_min
,如果是,则将x_min
设置为当前第一列的值。对
x_max
和y_max
执行等效操作(注意:我们只需要第二列的最大值,而不是最小值)循环遍历数据文件后,打印结果如下:
<前><代码>x_min y_max * 0.9
x_最大 y_最大 * 0.9
为了在 gnuplot 中工作,我们从上面附加我们的脚本,如下所示:
注意 gnuplot 脚本中的
\"
。“
需要转义,以便 gnuplot 不会被它们绊倒......毕竟你应该得到这样的情节:
绿线标记最大 y 值的 90%。
You can use
awk
to calculate the line at a given value.Example
If you have a data file
Data.csv
like so:you can plot it with
Now if you want to draw a line at 90% of the maximal value of the second column (in this case 90) run an awk script. Its purpose is to identify the minimum and maximum x-value and the 90% value of the maximal y-value. It could look something like this:
Basically what it does is the following:
Check if
x_min
exists and if it does not setx_min
,x_max
andy_max
to the first or second column ofData.csv
.Check if the current first column is larger than the current
x_min
, if that is the case, setx_min
to the value of the current first column.Do the equivalent for
x_max
andy_max
(Note: we only need the maximum of the second column and not the minimum)After we looped through our data file print the result like so:
In order to make this work in gnuplot we append our script from above like so:
Note the
\"
in the gnuplot script. The"
need to be escaped for gnuplot not to stumble over them...After all you should end up with a plot like this:
The green line marks the 90% value of the maximal y-value.