包含常数集的测试

发布于 2024-07-10 17:35:59 字数 1215 浏览 9 评论 0原文

问题陈述:

给定一组预先已知的整数,生成代码来测试该集合中是否存在单个整数。 测试函数的定义域是某个连续范围内的整数。


现在对要测试的范围或集合一无所知。 该范围可以很小或很大(但解决方案可以拒绝太大的问题,但限制越高越好)。 允许范围内的值可能很少在集合中,也可能大部分在集合中,或者介于两者之间。 该集合可以是均匀分布的或聚集的。 可能有很大一部分仅包含/不包含值,或者在大多数条带中可能至少有一些每种类型的值。 (有点像分析排序算法时对要排序的项目所做的假设)

目标是生成用于运行测试的有效代码的过程。

想到的部分解决方案包括

  • 完美的哈希函数(对于大型集合来说成本高昂)
  • 范围测试:foreach(b in range) if(bl <= v && v <= bh) return true;
  • 树/索引(在某些情况下比其他树/索引更昂贵)
  • 表查找(对于大型集合来说成本很高)
  • 其中任何一个的逆(kodos 到 Jason S

似乎理想的解决方案能够选择最好的选项,或者如果没有一个效果很好,则使用树将整个范围分解为多个部分,然后切换到其他选项更适合他们的小节。

可能有用的主题包括:


注意:这不是家庭作业。 如果它是作为低于博士水平的家庭作业发布的,教授应该用 Nerf 枪射击(如果你没有得到这一点,那么重新阅读这个问题,这是非常重要的)

注意:这是一个发生过的问题我连续几天都在思考这个问题。 我对此没有直接用途,但认为这将是一个很酷的问题。 我想要生成代码的原因是因为生成的代码不会比一般代码慢(如果需要,它可以是相同的东西),并且在某些/许多情况下可能会更快。

我发布这个问题是为了澄清我的想法。 如果我能提出任何合理或很酷的解决方案,我计划将它们实现为模板元程序(生成代码的另一个原因),

有些人已经注意到这个问题非常普遍。 这就是我想说的。 我希望生成一个可以在非常通用的领域工作的系统:某个范围内的整数集。

The problem statement:

Given a set of integers that is known in advance, generate code to test if a single integer is in the set. The domain of the testing function is the integers in some consecutive range.


Nothing in particular is known now about the range or the set to be tested. The range could be small or huge (but a solution can reject problems that are to big but higher limits are better). It could be that very few of the values in the allowed range are in the set or most of them are or anything in between. The set may be uniformly distributed or clustered. There may be large sections of only contained/not-contained values or there may be at least a few of each type of value in most swaths. (sort of like the assumption made about items to be sorted when analyzing sorting algorithms)

The objective is a procedure for generating effective code for running the test.

Partial solutions that come to mind include

  • perfect hash function (costly for large sets)
  • range tests: foreach(b in ranges) if(b.l <= v && v <= b.h) return true;
  • trees/indexes (more costly than others in some cases)
  • table lookup (costly for large sets)
  • the inverse of any of these these (kodos to Jason S)

It seems that an ideal solution would be able to pick what option is best or if none work well, use a tree to break down the full range into sections and then switch to other options for subsection that are better suited to them.

Topic(s) that might be useful include:


Note: this is not homework. if it was issued as homework below the doctoral level the prof should be shot with a Nerf gun (if you don't get that then re-read the problem, it is very much non trivial)

Note: This is a problem that occurred to me a few days a go and I've been puzzling over it off and on. I have no direct use for this but thought it would be a cool problem to attack. The reason that I wan to generate the code is because generated code will be no slower than general code (it can be the same thing if needed) and might be faster in some/many cases.

I'm posting this question as much to clarify my thoughts as anything. If I can come up with any reasonable or cool solutions I plan on implementing them as a template meta program (the other reason for generated code)

some people have noted that the problem is very general. That is the point. I'm hoping to generate a system that would work an a very general domain: sets of integers in some range.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

我不在是我 2024-07-17 17:35:59

上一个关于字典/拼写检查的问题有许多回复提到布隆过滤器; 也许这会有所帮助。

我认为无论如何,大型设备的测试都会很昂贵。

a previous question on dictionary/spellchecking had a number of responses that mentioned Bloom filters; maybe that would help.

I would think that testing for large sets is going to be expensive no matter what.

黄昏下泛黄的笔记 2024-07-17 17:35:59

让我们假装这是一个真正的问题:

  • 基集或输入集的大小没有限制,

这使得“问题”在任何实际意义上都不现实、不明确且无法解决(

如果有人愿意的话)为了提出解决方案,这里有一些单元测试用例:

  • 单元测试 1:

    • 基本集是 -1,000,000,000,000 到 +1,000,000,000,000 之间的所有整数,除了 100,000,000,000 个随机删除的值
    • 输入集是同一范围内随机生成的 100,000,000,000 个整数
  • 单元测试 2:

    • 基础集是斐波那契数列
    • 输入集是 1T 随机生成的整数,范围为 0..infinity

let's pretend, for a moment, that this is a real question:

  • there are no limits on the size of the base set or the input set

this makes the "problem" unrealistic, underspecified, and un-solvable in any practical sense

if someone wants to posit a solution, here's some unit test cases:

  • unit test 1:

    • the base set is all integers between -1,000,000,000,000 and +1,000,000,000,000 except for 100,000,000,000 randomly-removed values
    • the input set is 100,000,000,000 randomly-generated integers in the same range
  • unit test 2:

    • the base set is the Fibonacci series
    • the input set is 1T randomly-generated integers in the range 0..infinity
燃情 2024-07-17 17:35:59

还有boost::dynamic_bitset,不知道如何它根据时间或空间相对于原始数字的分布进行缩放。 (例如,如果位存储在 8/16/32/64 的块中,则稀疏位集效率低下)

或者可能 此(压缩位集)此(位向量) 网页(我用谷歌搜索“大型稀疏位集”和“压缩位集”)

there's also boost::dynamic_bitset, not sure how it scales for time, or in space with respect to distribution of original numbers. (e.g. if the bits are stored in chunks of 8/16/32/64, then sparse bitsets are inefficient)

or perhaps this (compressed bit set) or this (bit vector) webpage (I googled for "large sparse bit sets" and "compressed bit sets")

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文