使用 timeit() 和均值假设检验
在比较 Android 上 SL4A 的较长(> 1 行)代码片段时,我使用 timeit() 作为准确基准时遇到了一些问题。比较时间时我得到了相当大的变化。 (可能与 android/dalvik vm 分配 cpu 时间的方式有关?)。
不管怎样,我编写了一个脚本,使用假设检验来分析大(~1000)次样本。这种做法有什么问题吗?
from math import sqrt
import timeit
#statistics stuff
mean = lambda x: sum(x) / float(len(x))
def stdev (mean, dataset):
variance = ((x - mean)**2 for x in dataset)
deviation = sqrt(sum(variance) / float(len(dataset) - 1))
return deviation / sqrt(len(dataset))
def interval(mean, sampleDeviation, defaultZ = 1.57):
margin = sampleDeviation * defaultZ
return (mean - margin, mean + margin)
def testnull(dataset1, dataset2, defaultZ = 1.57):
mean1, mean2 = mean(dataset1), mean(dataset2)
sd1, sd2 = stdev(mean1, dataset1), stdev(mean2, dataset2)
interval1, interval2 = interval(mean1, sd1, defaultZ), interval(mean2, sd2, defaultZ)
inside = lambda x, y: y >= x[0] and y <= x[1]
if inside(interval1, interval2[0]) or inside(interval1, interval2[1]):
return True
return False
#timer setup
t1 = timeit.Timer('sum(x)', 'x = (i for i in range(1000))')
t2 = timeit.Timer('sum(x)', 'x = list(range(1000))')
genData, listData = [], []
for i in range(10000):
genData.append(t1.timeit())
listData.append(t2.timeit())
# testing the interval
print('The null hypothesis is {0}'.format(testnull(genData, listData)))
I've had some problems using timeit() as an accurate benchmark when comparing longer (> 1 lines) code snippets for SL4A on Android. I get a pretty high variation when comparing times. (Possibly something to do with the way android/dalvik vm allocates cpu time?).
Anyway, I wrote a script that uses a hypothesis test for means to analyse large (~1000) samples of times. Is there anything wrong with this approach?
from math import sqrt
import timeit
#statistics stuff
mean = lambda x: sum(x) / float(len(x))
def stdev (mean, dataset):
variance = ((x - mean)**2 for x in dataset)
deviation = sqrt(sum(variance) / float(len(dataset) - 1))
return deviation / sqrt(len(dataset))
def interval(mean, sampleDeviation, defaultZ = 1.57):
margin = sampleDeviation * defaultZ
return (mean - margin, mean + margin)
def testnull(dataset1, dataset2, defaultZ = 1.57):
mean1, mean2 = mean(dataset1), mean(dataset2)
sd1, sd2 = stdev(mean1, dataset1), stdev(mean2, dataset2)
interval1, interval2 = interval(mean1, sd1, defaultZ), interval(mean2, sd2, defaultZ)
inside = lambda x, y: y >= x[0] and y <= x[1]
if inside(interval1, interval2[0]) or inside(interval1, interval2[1]):
return True
return False
#timer setup
t1 = timeit.Timer('sum(x)', 'x = (i for i in range(1000))')
t2 = timeit.Timer('sum(x)', 'x = list(range(1000))')
genData, listData = [], []
for i in range(10000):
genData.append(t1.timeit())
listData.append(t2.timeit())
# testing the interval
print('The null hypothesis is {0}'.format(testnull(genData, listData)))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为这是明智的。您想要的是比较两个版本代码的置信区间是否重叠。 Georges 等人 (2007) 对您正在尝试的技术有完整的描述使用。
I think that's sensible. What you want is to compare the confidence intervals of both versions of the code for overlap. Georges et al (2007) has a complete description of the technique you are trying to use.