如何在 Groovy 中选择 X 位置的 Y 值?
这是一个数学问题......
在此之前我有一个关于标准化每月数据的问题: 如何生成拉伸图的 X 值?
我得到了一个很好的答案并且效果很好,唯一的问题是现在我需要检查一个月 31 天的 X 值与一个月 28 天的 X 值。
所以我的问题是:如果我有两组参数像这样:
x | y x2 | y2
1 | 10 1.0 | 10
2 | 9 1.81 | 9.2
3 | 8 2.63 | 8.6
4 | 7 3.45 | 7.8
5 | 6 4.27 | 7
6 | 5 5.09 | 6.2
7 | 4 5.91 | 5.4
8 | 3 6.73 | 4.2
9 | 2 7.55 | 3.4
10 | 1 8.36 | 2.6
9.18 | 1.8
10.0 | 1.0
像你一样可以看到,这两个数据集的总体趋势是相同的。 但是,如果我通过互相关函数(总体目标)运行这些值,我将得到一些无法反映这一点的结果,因为数据集有两种不同的大小。
现实世界的例子是,如果您要跟踪每天跑多少英里:
二月份(有 28 天)的第一周,您每天跑一英里。第二周,你每天跑两英里,以此类推。
三月份(有31天),你做同样的事情,但八天跑一英里,八天跑两英里,八天跑三英里,然后七天四英里。
根据以下函数,相关系数应该几乎恰好为 1:
class CrossCorrelator {
def variance = { x->
def v = 0
x.each{ v += it**2}
v/(x.size()) - (mean(x)**2)
}
def covariance = {x, y->
def z = 0
[x, y].transpose().each{ z += it[0] * it[1] }
(z / (x.size())) - (mean(x) * mean(y))
}
def coefficient = {x, y->
covariance(x,y) / (Math.sqrt(variance(x) * variance(y)))
}
}
def i = new CrossCorrelator()
i.coefficient(y values, y2 values)
仅查看数据集,如果我要获取 1、2、3、4、5、6 处的值,图表似乎会完全相同、7、8、9 和 10,该函数会产生更准确的结果。
然而,由于长度不一样,它是倾斜的。
有没有某种方法可以找到十二值数据集中整数的值是什么?我还没有找到一种简单的方法来做到这一点,但这将非常有帮助。
提前致谢,
5
编辑:根据请求,这里是生成图形 X 值的代码:
def x = (1..12)
def y = 10
change = {l, size ->
v = [1]
l.each{
v << ((((size-1)/(x.size() - 1)) * it) + 1)
}
v -= v.last()
return v
}
change(x, y)
编辑:根据另一个请求,代码不起作用:
def normalize( xylist, days ) {
xylist.collect { x, y -> [ x * ( days / xylist.size() ), y ] }
}
def change = {l, size ->
def v = [1]
l.each{
v << ((((size-1)/(l.size() - 1)) * it) + 1)
}
v -= v.last()
return v
}
def resample( list, min, max ) {
// We want a graph with integer points from min to max on the x axis
(min..max).collect { i ->
// find the values above and below this point
bounds = list.inject( [ a:null, b:null ] ) { r, p ->
// if the value is less than i, set it in r.a
if( p[ 0 ] < i )
r.a = p
// if it's bigger (and we don't already have a bigger point)
// then set it into r.b
if( !r.b && p[ 0 ] >= i )
r.b = p
r
}
// so now, bounds.a is the point below our required point, and bounds.b
// Deal with the first case (where a is null, because we are at the start)
if( !bounds.a )
[ i, list[ 0 ][ 1 ] ]
else {
// so work out the distance from bounds.a to bounds.b
dist = ( bounds.b[0] - bounds.a[0] )
// And how far the point i is along this line
r = ( i - bounds.a[0] ) / dist
// and recalculate the y figure for this point
y = ( ( bounds.b[1] - bounds.a[1] ) * r ) + bounds.a[1]
[ i, y ]
}
}
}
def feb = [9, 3, 7, 23, 15, 16, 17, 18, 19, 13, 14, 8, 13, 12, 15, 6, 7, 13, 19, 12, 7, 3, 4, 15, 6, 17, 8, 19]
def march = [8, 12, 4, 17, 11, 15, 12, 8, 9, 13, 12, 7, 3, 4, 8, 2, 17, 19, 21, 12, 12, 13, 14, 15, 16, 7, 8, 19, 21, 14, 16]
//X and Y Values for February
z = [(1..28), change(feb, 28)].transpose()
//X and Y Values for March stretched to 28 entries
o = [(1..31), change(march, 28)].transpose()
o1 = normalize(o, 28)
resample(o1, 1, 28)
如果我将 o 变量声明中的“march”切换为 (1.. 31),脚本运行成功。当我尝试使用“进行曲”时,我得到“ java.lang.NullPointerException: Cannot invoke method getAt() on null object"
另外:我尝试不直接复制代码,因为这是不好的做法,所以我更改的函数之一基本上做了同样的事情,这只是我的版本。我最终也会重构其余部分,但这就是为什么它略有不同。
this is sort of a mathy question...
I had a question prior to this about normalizing monthly data here :
How to produce X values of a stretched graph?
I got a good answer and it works well, the only issue is that now I need to check X values of one month with 31 days against X values of a month with 28.
So my question would be: If I have two sets of parameters like so:
x | y x2 | y2
1 | 10 1.0 | 10
2 | 9 1.81 | 9.2
3 | 8 2.63 | 8.6
4 | 7 3.45 | 7.8
5 | 6 4.27 | 7
6 | 5 5.09 | 6.2
7 | 4 5.91 | 5.4
8 | 3 6.73 | 4.2
9 | 2 7.55 | 3.4
10 | 1 8.36 | 2.6
9.18 | 1.8
10.0 | 1.0
As you can see, the general trend is the same for these two data sets.
However, if I run these values through a cross-correlation function (the general goal), I will get something back that does not reflect this, since the data sets are of two different sizes.
The real world example of this would be, say, if you are tracking how many miles you run per day:
In February (with 28 days), during the first week, you run one mile each day. During the second week, you run two miles each day, etc.
In March (with 31 days), you do the same thing, but run for one mile for eight days, two miles for eight days, three miles for eight days, and four miles for seven days.
The correlation coefficient according to the following function should be almost exactly 1:
class CrossCorrelator {
def variance = { x->
def v = 0
x.each{ v += it**2}
v/(x.size()) - (mean(x)**2)
}
def covariance = {x, y->
def z = 0
[x, y].transpose().each{ z += it[0] * it[1] }
(z / (x.size())) - (mean(x) * mean(y))
}
def coefficient = {x, y->
covariance(x,y) / (Math.sqrt(variance(x) * variance(y)))
}
}
def i = new CrossCorrelator()
i.coefficient(y values, y2 values)
Just looking at the data sets, it seems like the graphs would be exactly the same if I were to grab the values at 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10, and the function would produce a more accurate result.
However, it's skewed since the lengths are not the same.
Is there some way to locate what the values at the integers in the twelve-value data set would be? I haven't found a simple way to do it, but this would be incredibly helpful.
Thanks in advance,
5
Edit: As per request, here is the code that generates the X values of the graphs:
def x = (1..12)
def y = 10
change = {l, size ->
v = [1]
l.each{
v << ((((size-1)/(x.size() - 1)) * it) + 1)
}
v -= v.last()
return v
}
change(x, y)
Edit: Not working code as per another request:
def normalize( xylist, days ) {
xylist.collect { x, y -> [ x * ( days / xylist.size() ), y ] }
}
def change = {l, size ->
def v = [1]
l.each{
v << ((((size-1)/(l.size() - 1)) * it) + 1)
}
v -= v.last()
return v
}
def resample( list, min, max ) {
// We want a graph with integer points from min to max on the x axis
(min..max).collect { i ->
// find the values above and below this point
bounds = list.inject( [ a:null, b:null ] ) { r, p ->
// if the value is less than i, set it in r.a
if( p[ 0 ] < i )
r.a = p
// if it's bigger (and we don't already have a bigger point)
// then set it into r.b
if( !r.b && p[ 0 ] >= i )
r.b = p
r
}
// so now, bounds.a is the point below our required point, and bounds.b
// Deal with the first case (where a is null, because we are at the start)
if( !bounds.a )
[ i, list[ 0 ][ 1 ] ]
else {
// so work out the distance from bounds.a to bounds.b
dist = ( bounds.b[0] - bounds.a[0] )
// And how far the point i is along this line
r = ( i - bounds.a[0] ) / dist
// and recalculate the y figure for this point
y = ( ( bounds.b[1] - bounds.a[1] ) * r ) + bounds.a[1]
[ i, y ]
}
}
}
def feb = [9, 3, 7, 23, 15, 16, 17, 18, 19, 13, 14, 8, 13, 12, 15, 6, 7, 13, 19, 12, 7, 3, 4, 15, 6, 17, 8, 19]
def march = [8, 12, 4, 17, 11, 15, 12, 8, 9, 13, 12, 7, 3, 4, 8, 2, 17, 19, 21, 12, 12, 13, 14, 15, 16, 7, 8, 19, 21, 14, 16]
//X and Y Values for February
z = [(1..28), change(feb, 28)].transpose()
//X and Y Values for March stretched to 28 entries
o = [(1..31), change(march, 28)].transpose()
o1 = normalize(o, 28)
resample(o1, 1, 28)
If I switch "march" in the o variable declaration to (1..31), the script runs successfully. When I try to use "march," I get "
java.lang.NullPointerException: Cannot invoke method getAt() on null object"
Also: I try not to directly copy code just because it's bad practice, so one of the functions I changed basically does the same thing, it's just my version. I'll get around to refactoring the rest of it eventually, too. But that's why it's slightly different.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
好吧...我们开始...这可能不是最干净的代码...
让我们首先生成两个分布,都从 1 到 10(在 y 轴)
是
所以现在,e1 和 e2分别 (至 2dp)。现在,使用上一个问题中的代码,我们可以将它们标准化为相同的 x 范围:
这意味着 n1 和 n2 是:
但是,正如您正确指出的那样,它们具有不同数量的样本点,因此无法轻松进行比较。
但是我们可以编写一个方法来逐步遍历图中我们想要的每个点,找到两个最近的点,并从这两个点的值中插入 y 值,如下所示:
现在,值
final1
和final2
是:(显然,这里有一些舍入,所以 2d.p. 隐藏了它们不完全相同的事实)
唷……一定是在家时间之后;-)
编辑
正如问题编辑中所指出的,我的
resample
方法中存在一个错误,导致它在某些条件下失败......我相信这个问题现已修复上面的代码,以及给定的示例:
如果绘制原始 31 个点(在
o
中)和 28 个点的新图(在v
中),您将得到:<图片src="https://i.sstatic.net/fH1zd.png" alt="在此处输入图像描述">
看起来还不错。
我不知道
change
方法应该做什么,所以我从这段代码中省略了它Ok...here we go...this may not be the cleanest bit of code ever...
Let's first generate two distributions, both from 1 to 10 (in the y axis)
So now, e1 and e2 are:
respectively (to 2dp). Now, using the code from the previous question, we can normalize these to the same x range:
This means n1 and n2 are:
But, as you correctly state they have different numbers of sample points, so cannot be compared easily.
But we can write a method to step through each point we want in our graph, fond the two closest points, and interpolate a y value from the values of these two points like so:
now, the values
final1
andfinal2
are:(obviously, there is some rounding here, so 2d.p. is hiding the fact that they are not exactly the same)
Phew... Must be home-time after that ;-)
EDIT
As pointed out in the edit to the question, there was a bug in my
resample
method that caused it to fail in certain conditions...I believe this has now been fixed in the code above, and from the given example:
If you plot the original 31 points (in
o
) and the new graph of 28 points (inv
), you get:Which doesn't look too bad.
I have no idea what the
change
method was supposed to do, so I have omitted it from this code