谜题:需要一个“复杂”谜题的例子不允许排序和/或散列的等价关系/分区
从问题“分区比排序更容易吗?”:
假设我有一个项目列表和一个 它们之间的等价关系,以及 比较两个项目需要常数 时间。我想返回一个分区 项目,例如链接列表 列表,每个列表包含所有等效项 项目。
做到这一点的一种方法是扩展 等价于上的排序 项目并排序(使用排序 算法);然后所有等价的项目 将是相邻的。
(请记住相等和等价之间的区别。)
显然,当设计排序算法。例如,如果等价关系是“同年出生的人是等价的”,那么根据人名排序就不合适了。
您能否建议一种无法创建排序的数据类型和等价关系?
数据类型和等价关系怎么样,可以创建这样的排序,但不可能在要映射的数据类型上定义哈希函数相同哈希值的等价项。
(注意:如果不同的项映射到相同的哈希值(碰撞),这是可以的——我并不是要求解决碰撞问题——但另一方面,hashFunc(item) { return 1; }
是作弊。)
我的怀疑是,对于任何可以定义排序的数据类型/等价对,也可以定义合适的哈希函数,并且它们将具有相似的算法复杂性。这个猜想的反例将会很有启发!
From the question "Is partitioning easier than sorting?":
Suppose I have a list of items and an
equivalence relation on them, and
comparing two items takes constant
time. I want to return a partition of
the items, e.g. a list of linked
lists, each containing all equivalent
items.One way of doing this is to extend the
equivalence to an ordering on the
items and order them (with a sorting
algorithm); then all equivalent items
will be adjacent.
(Keep in mind the distinction between equality and equivalence.)
Clearly the equivalence relation must be considered when designing the ordering algorithm. For example, if the equivalence relation is "people born in the same year are equivalent", then sorting based on the person's name is not appropriate.
Can you suggest a datatype and equivalence relation such that it is not possible to create an ordering?
How about a datatype and equivalence relation where it is possible to create such an ordering, but it is not possible to define a hash function on the datatype that will map equivalent items to the same hash value.
(Note: it is OK if nonequivalent items map to the same hash value (collide) -- I'm not asking to solve the collision problem -- but on the other hand, hashFunc(item) { return 1; }
is cheating.)
My suspicion is that for any datatype/equivalence pair where it is possible to define an ordering, it will also be possible to define a suitable hash function, and they will have similar algorithmic complexity. A counterexample to that conjecture would be enlightening!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
问题 1 和 2 的答案是否定的,在以下意义上:给定字符串 {0, 1}* 上的可计算等价关系 ≡,存在一个可计算函数 f 使得 x ≡ y 如果且仅当 f(x) = f(y) 时,才会产生顺序/哈希函数。 f(x) 的一种定义很简单,而且计算速度非常慢:按字典顺序枚举 {0, 1}* (ε, 0, 1, 00, 01, 10, 11, 000, ...) 并返回与 x 等效的第一个字符串。当我们到达 x 时,我们保证终止,因此该算法总是停止。
The answer to questions 1 and 2 is no, in the following sense: given a computable equivalence relation ≡ on strings {0, 1}*, there exists a computable function f such that x ≡ y if and only if f(x) = f(y), which leads to an order/hash function. One definition of f(x) is simple, and very slow to compute: enumerate {0, 1}* in lexicographic order (ε, 0, 1, 00, 01, 10, 11, 000, …) and return the first string equivalent to x. We are guaranteed to terminate when we reach x, so this algorithm always halts.
创建哈希函数和排序可能很昂贵,但通常是可能的。一个技巧是用该类的预先安排的成员来表示等价类,例如,当将其视为位串时,序列化表示最小的成员。当有人向您提供等价类的成员时,将其映射到该类的规范化成员,然后散列或比较该成员的位字符串表示形式。参见例如 http://en.wikipedia.org/wiki/Canonical#Mathematics
示例这是不可能或不方便的情况,包括当有人给你一个指向实现 equals() 的对象的指针但没有其他有用的东西时,并且你不能破坏类型系统来查看对象内部,以及当你得到以下结果时一项只要求人们判断物体之间平等的调查。此外,Kruskal 的算法在内部使用 Union&Find 来处理等价关系,因此对于这个特定的应用程序来说,可能没有发现更具成本效益的东西。
Creating a hash function and an ordering may be expensive but will usually be possible. One trick is to represent an equivalence class by a pre-arranged member of that class, for instance, the member whose serialised representation is smallest, when considered as a bit string. When somebody hands you a member of an equivalence class, map it to this canonicalised member of that class, and then hash or compare the bit string representation of that member. See e.g. http://en.wikipedia.org/wiki/Canonical#Mathematics
Examples where this is not possible or convenient include when somebody gives you a pointer to an object that implements equals() but nothing else useful, and you do not get to break the type system to look inside the object, and when you get the results of a survey that only asks people to judge equality between objects. Also Kruskal's algorithm uses Union&Find internally to process equivalence relations, so presumbly for this particular application nothing more cost-effective has been found.
IEEE 浮点类型似乎符合您的要求。特别是,NaN 不会与其他任何东西(甚至也不会与其自身)进行比较,除非您采取特殊步骤来检测它是 NaN,并始终调用该等价物。
对于散列也是如此。如果内存允许,任何尾数所有位都设置为 0 的浮点数将被视为具有值 0.0,无论指数中的位设置为何。我可能记得有点错误,但无论如何,这个想法都是一样的——数字的一部分中的正确位模式意味着它的值为 0.0,无论其余的部分。除非您的哈希函数考虑到这一点,否则它将为真正比较精确相等的数字生成不同的哈希值。
One example that seems to fit your request is an IEEE floating point type. In particular, a NaN doesn't compare as equivalent to anything else (nor even to itself) unless you take special steps to detect that it's a NaN, and always call that equivalent.
Likewise for hashing. If memory serves, any floating point number with all bits of the significand set to 0 is treated as having the value 0.0, regardless of what the bits in the exponent are set to. I could be remembering that a bit wrong, but the idea is the same in any case -- the right bit pattern in one part of the number means that it has the value 0.0, regardless of the bits in the rest. Unless your hash function takes this into account, it will produce different hash values for numbers that really compare precisely equal.
您可能知道,基于比较的排序至少需要 O(n log n) 时间(更正式地说,您会说它是 Omega(n log n))。如果您知道等价类的数量少于 log2(n) 个,那么分区速度会更快,因为您只需检查每个等价类的单个成员的等价性即可确定应将给定元素分配给分区中的哪个部分。
也就是说,你的算法可能是这样的:
如果有 m 个等价类,则内部循环最多运行 m 次,总共需要 O(nm) 时间。正如 ShreetvatsaR 在评论中观察到的那样,最多可以有 n 个等价类,因此这是 O(n^2)。请注意,即使 X 上没有总排序,这也有效。
As you probably know, comparison-based sorting takes at least O(n log n) time (more formally you would say it is Omega(n log n)). If you know that there are fewer than log2(n) equivalence classes, then partitioning is faster, since you only need to check equivalence with a single member of each equivalence class to determine which part in the partition you should assign a given element to.
I.e. your algorithm could be like this:
If there are m equivalence classes, the inner loop runs at most m times, taking O(nm) time overall. As ShreetvatsaR observes in a comment, there can be at most n equivalence classes, so this is O(n^2). Note this works even if there is not a total ordering on X.
理论上,由于良序定理,即使您有无数个分区。
即使您限制为可计算函数,一次性账户的答案也可以回答这个问题。
您需要更精确地定义您的问题:-)
在任何情况下,
实际上,
请考虑以下事项:
您的数据类型是无符号整数数组的集合。排序是字典顺序比较。
你可以考虑 hash(x) = x,但我认为这也是作弊:-)
我会说(但没有更多地考虑获得哈希函数,所以很可能是错误的)按顺序分区更实用而不是通过散列进行分区,因为散列本身可能变得不切实际。 (毫无疑问,存在哈希函数)。
Theoretically, it is alway possible (for questions 1 and 2), because of the Well Ordering Theorem, even when you have an uncountable number of partitions.
Even if you restrict to computable functions, throwawayaccount's answer answers that.
You need to more precisely define your question :-)
In any case,
Practically speaking,
Consider the following:
You data type is the set of unsigned integer arrays. The ordering is lexicographic comparison.
You could consider hash(x) = x, but I suppose that is cheating too :-)
I would say (but haven't thought more about getting a hash function, so might well be wrong) that partitioning by ordering is much more practical than partitioning by hashing, as hashing itself could become impractical. (A hashing function exists, no doubt).
我相信...
...仅适用于无限(可能仅适用于不可数)集合。
...与上面相同。
I believe that...
...it's possible only for infinite (possibly only for non-countable) sets.
...same as above.
编辑:这个答案是错误的
我不会仅仅因为下面的一些评论具有启发性而删除它
并非每个等价关系都意味着一个顺序
由于等价关系不应产生顺序,因此我们将无序距离函数作为关系。
如果我们得到函数集 f(x):R -> R 作为我们的数据类型,并将等价关系定义为:
那么您就无法按该顺序排序(实数不存在单射函数)。由于函数空间的基数,您只是找不到将数据类型映射到数字的函数。
EDIT: This answer is wrong
I am not going to delete it just because some of the comments below are enlightening
Not every equivalence relationship implies an order
As your equivalence relationship should not induce an order, let´s take an un-ordered distance function as relation.
If we get the set of functions f(x):R -> R as our datatype, and define an equivalence relation as:
Then you can't sort on that order (no injective function exists with the Real numbers). You just can't find a function which maps your datatype to numbers due to the cardinality of the function's space.
假设 F(X) 是一个函数,它将某种数据类型 T 的元素映射到相同类型的另一个元素,这样对于任何类型 T 的 Y,都恰好有一个类型 T 的 X,使得 F(X)=Y 。进一步假设选择的函数使得对于给定的 Y,通常没有实际的方法可以在上述方程中找到 X。
定义 F0=X, F{1}(X)=F(X), F{2} (X)=F(F(X)) 等,所以 F{n}(X) = F(F{n-1}(X))。
现在定义一个包含正整数 K 的数据类型 Q 和一个类型 T 的对象 X。定义一个等价关系:
Q(a,X) vs Q(b,Y)
: b,项相等当且仅当 F{ab}(Y)==X
如果 a < b,这些项相等 iff F{ba}(X)==Y
如果 a=b,这些项相等 iff X==Y
对于任何给定对象 Q(a,X),对于 F{a,恰好存在一个 Z }(Z)==X。如果两个对象具有相同的 Z,则它们是等效的。可以定义基于 Z 的排序或散列函数。另一方面,如果选择 F 使得其逆无法实际计算,则比较元素的唯一实用方法可能是就是使用上面的等价函数。我知道如果不知道一个项目可能具有的最大可能的“a”值,或者不知道如何反转函数 F,就无法定义排序或哈希函数。
Suppose that F(X) is a function which maps an element of some data type T to another of the same type, such that for any Y of type T, there is exactly one X of type T such that F(X)=Y. Suppose further that the function is chosen so that there is generally no practical way of finding the X in the above equation for a given Y.
Define F0=X, F{1}(X)=F(X), F{2}(X)=F(F(X)), etc. so F{n}(X) = F(F{n-1}(X)).
Now define a data type Q containing a positive integer K and an object X of type T. Define an equivalence relation thus:
Q(a,X) vs Q(b,Y):
If a > b, the items are equal iff F{a-b}(Y)==X
If a < b, the items are equal iff F{b-a}(X)==Y
If a=b, the items are equal iff X==Y
For any given object Q(a,X) there exists exactly one Z for F{a}(Z)==X. Two objects are equivalent iif they would have the same Z. One could define an ordering or hash function based upon Z. On the other hand, if F is chosen such that its inverse cannot be practically computed, the only practical way to compare elements may be to use the equivalence function above. I know of no way to define an ordering or hash function without either knowing the largest possible "a" value an item could have, or having a means to invert function F.