A.1 第3章：in 运算符的性能测试

发布于 2024-02-05 21:59:46 字数 2867 浏览 0 评论 0 收藏 0

表 3-6 中的计时数据是我使用示例 A-1 中的代码生成的，这段代码用到了 timeit 模块。这个脚本主要用于设置 haystack 和 needles 样本，并格式化输出。

编写示例 A-1 时，我发现的确能客观比较 dict 的性能。如果在“详细模式”（指定命令行选项 -v）中运行这个脚本，用时几乎是表 3-5 中的两倍。但是注意，对这个脚本来说，在“详细模式”中，只是多了用于设置测试内容的四个 print 调用，以及在各个测试结束后显示找到多少个 needles 的那个 print 调用。在 haystack 中搜索 needles 的那个循环没有输出，不过这五个 print 调用耗费的时间与搜索 1000 个 needles 差不多。

示例 A-1　container_perftest.py：运行时以内置集合类型的名称为命令行参数（例如 container_perftest.py dict）

"""
对容器的``in``运算符做性能测试
"""
import sys
import timeit

SETUP = '''
import array

selected = array.array('d')
with open('selected.arr', 'rb') as fp:
  selected.fromfile(fp, {size})
if {container_type} is dict:
  haystack = dict.fromkeys(selected, 1)
else:
  haystack = {container_type}(selected)
if {verbose}:
  print(type(haystack), end='  ')
  print('haystack: %10d' % len(haystack), end='  ')
needles = array.array('d')
with open('not_selected.arr', 'rb') as fp:
  needles.fromfile(fp, 500)
needles.extend(selected[::{size}//500])
if {verbose}:
  print(' needles: %10d' % len(needles), end='  ')
'''

TEST = '''
found = 0
for n in needles:
  if n in haystack:
    found += 1
if {verbose}:
  print('  found: %10d' % found)
'''

def test(container_type, verbose):
  MAX_EXPONENT = 7
  for n in range(3, MAX_EXPONENT + 1):
    size = 10**n
    setup = SETUP.format(container_type=container_type,
               size=size, verbose=verbose)
    test = TEST.format(verbose=verbose)
    tt = timeit.repeat(stmt=test, setup=setup, repeat=5, number=1)
    print('|{:{}d}|{:f}'.format(size, MAX_EXPONENT + 1, min(tt)))

if __name__=='__main__':
  if '-v' in sys.argv:
    sys.argv.remove('-v')
    verbose = True
  else:
    verbose = False
  if len(sys.argv) != 2:
    print('Usage: %s <container_type>' % sys.argv[0])
  else:
    test(sys.argv[1], verbose)

container_perftest_datagen.py 脚本（见示例 A-2）为示例 A-1 中的脚本生成固件数据。

示例 A-2　container_perftest_datagen.py：生成由不同的浮点数组成的数组，然后写入文件，供示例 A-1 使用

"""
生成容器性能测试所需的数据
"""

import random
import array

MAX_EXPONENT = 7
HAYSTACK_LEN = 10 ** MAX_EXPONENT
NEEDLES_LEN = 10 ** (MAX_EXPONENT - 1)
SAMPLE_LEN = HAYSTACK_LEN + NEEDLES_LEN // 2

needles = array.array('d')

sample = {1/random.random() for i in range(SAMPLE_LEN)}
print('initial sample: %d elements' % len(sample))

# 完整的样本，防止丢弃了重复的随机数
while len(sample) < SAMPLE_LEN:
  sample.add(1/random.random())

print('complete sample: %d elements' % len(sample))

sample = array.array('d', sample)
random.shuffle(sample)

not_selected = sample[:NEEDLES_LEN // 2]
print('not selected: %d samples' % len(not_selected))
print('  writing not_selected.arr')
with open('not_selected.arr', 'wb') as fp:
  not_selected.tofile(fp)

selected = sample[NEEDLES_LEN // 2:]
print('selected: %d samples' % len(selected))
print('  writing selected.arr')
with open('selected.arr', 'wb') as fp:
  selected.tofile(fp)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据

A.1 第3章：in 运算符的性能测试

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。