包含常数集的测试
问题陈述:
给定一组预先已知的整数,生成代码来测试该集合中是否存在单个整数。 测试函数的定义域是某个连续范围内的整数。
现在对要测试的范围或集合一无所知。 该范围可以很小或很大(但解决方案可以拒绝太大的问题,但限制越高越好)。 允许范围内的值可能很少在集合中,也可能大部分在集合中,或者介于两者之间。 该集合可以是均匀分布的或聚集的。 可能有很大一部分仅包含/不包含值,或者在大多数条带中可能至少有一些每种类型的值。 (有点像分析排序算法时对要排序的项目所做的假设)
目标是生成用于运行测试的有效代码的过程。
想到的部分解决方案包括
- 完美的哈希函数(对于大型集合来说成本高昂)
- 范围测试:foreach(b in range) if(bl <= v && v <= bh) return true;
- 树/索引(在某些情况下比其他树/索引更昂贵)
- 表查找(对于大型集合来说成本很高)
- 其中任何一个的逆(kodos 到 Jason S)
似乎理想的解决方案能够选择最好的选项,或者如果没有一个效果很好,则使用树将整个范围分解为多个部分,然后切换到其他选项更适合他们的小节。
可能有用的主题包括:
注意:这不是家庭作业。 如果它是作为低于博士水平的家庭作业发布的,教授应该用 Nerf 枪射击(如果你没有得到这一点,那么重新阅读这个问题,这是非常重要的)
注意:这是一个发生过的问题我连续几天都在思考这个问题。 我对此没有直接用途,但认为这将是一个很酷的问题。 我想要生成代码的原因是因为生成的代码不会比一般代码慢(如果需要,它可以是相同的东西),并且在某些/许多情况下可能会更快。
我发布这个问题是为了澄清我的想法。 如果我能提出任何合理或很酷的解决方案,我计划将它们实现为模板元程序(生成代码的另一个原因),
有些人已经注意到这个问题非常普遍。 这就是我想说的。 我希望生成一个可以在非常通用的领域工作的系统:某个范围内的整数集。
The problem statement:
Given a set of integers that is known in advance, generate code to test if a single integer is in the set. The domain of the testing function is the integers in some consecutive range.
Nothing in particular is known now about the range or the set to be tested. The range could be small or huge (but a solution can reject problems that are to big but higher limits are better). It could be that very few of the values in the allowed range are in the set or most of them are or anything in between. The set may be uniformly distributed or clustered. There may be large sections of only contained/not-contained values or there may be at least a few of each type of value in most swaths. (sort of like the assumption made about items to be sorted when analyzing sorting algorithms)
The objective is a procedure for generating effective code for running the test.
Partial solutions that come to mind include
- perfect hash function (costly for large sets)
- range tests:
foreach(b in ranges) if(b.l <= v && v <= b.h) return true;
- trees/indexes (more costly than others in some cases)
- table lookup (costly for large sets)
- the inverse of any of these these (kodos to Jason S)
It seems that an ideal solution would be able to pick what option is best or if none work well, use a tree to break down the full range into sections and then switch to other options for subsection that are better suited to them.
Topic(s) that might be useful include:
Note: this is not homework. if it was issued as homework below the doctoral level the prof should be shot with a Nerf gun (if you don't get that then re-read the problem, it is very much non trivial)
Note: This is a problem that occurred to me a few days a go and I've been puzzling over it off and on. I have no direct use for this but thought it would be a cool problem to attack. The reason that I wan to generate the code is because generated code will be no slower than general code (it can be the same thing if needed) and might be faster in some/many cases.
I'm posting this question as much to clarify my thoughts as anything. If I can come up with any reasonable or cool solutions I plan on implementing them as a template meta program (the other reason for generated code)
some people have noted that the problem is very general. That is the point. I'm hoping to generate a system that would work an a very general domain: sets of integers in some range.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
上一个关于字典/拼写检查的问题有许多回复提到布隆过滤器; 也许这会有所帮助。
我认为无论如何,大型设备的测试都会很昂贵。
a previous question on dictionary/spellchecking had a number of responses that mentioned Bloom filters; maybe that would help.
I would think that testing for large sets is going to be expensive no matter what.
让我们假装这是一个真正的问题:
这使得“问题”在任何实际意义上都不现实、不明确且无法解决(
如果有人愿意的话)为了提出解决方案,这里有一些单元测试用例:
单元测试 1:
单元测试 2:
let's pretend, for a moment, that this is a real question:
this makes the "problem" unrealistic, underspecified, and un-solvable in any practical sense
if someone wants to posit a solution, here's some unit test cases:
unit test 1:
unit test 2:
还有boost::dynamic_bitset,不知道如何它根据时间或空间相对于原始数字的分布进行缩放。 (例如,如果位存储在 8/16/32/64 的块中,则稀疏位集效率低下)
或者可能 此(压缩位集) 或此(位向量) 网页(我用谷歌搜索“大型稀疏位集”和“压缩位集”)
there's also boost::dynamic_bitset, not sure how it scales for time, or in space with respect to distribution of original numbers. (e.g. if the bits are stored in chunks of 8/16/32/64, then sparse bitsets are inefficient)
or perhaps this (compressed bit set) or this (bit vector) webpage (I googled for "large sparse bit sets" and "compressed bit sets")