想象两个正整数 A 和 B。我想将这两个组合成一个整数 C。
不能有其他整数 D 和 E 组合成 C。
因此将它们与加法运算符结合起来是行不通的。 例如 30 + 10 = 40 = 40 + 0 = 39 + 1
连接也不起作用。 例如“31”+“2”= 312 =“3”+“12”
此组合运算也应该是确定性的(使用相同的输入始终产生相同的结果)并且应该始终产生一个整数整数的正数或负数。
Imagine two positive integers A and B. I want to combine these two into a single integer C.
There can be no other integers D and E which combine to C.
So combining them with the addition operator doesn't work. Eg 30 + 10 = 40 = 40 + 0 = 39 + 1
Neither does concatination work. Eg "31" + "2" = 312 = "3" + "12"
This combination operation should also be deterministic (always yield the same result with the same inputs) and should always yield an integer on either the positive or the negative side of integers.
发布评论
评论(21)
Cantor 配对函数 考虑到其简单、快速且节省空间,它确实是最好的函数之一,但是,Wolfram Matthew Szudzik 在此处发表了一些更好的内容。 康托配对函数(相对)的局限性在于,如果输入是两个
N
位,则编码结果的范围并不总是保持在2N
位整数的限制内整数。 也就是说,如果我的输入是两个16
位整数,范围从0 到 2^16 -1
,则有2^16 * (2^16 - 1)
输入可能的组合,因此根据明显的鸽洞原理,我们需要一个输出大小至少为2^16 * (2^16 -1)
,等于2^32 - 2^16
,或者换句话说,一个地图理想情况下,32 位数字应该是可行的。 这在编程世界中可能具有不小的实际重要性。康托尔配对函数:
输入Szudzik 函数:
现在考虑到我们通常在语言/框架中处理不同大小的数字的有符号实现,让我们考虑范围从
-(2^15) 到 2^15 的
(稍后我们将看到如何扩展输出以跨越有符号范围)。 由于有符号 16
位整数-1a
和b
必须为正数,因此它们的范围为0 到 2^15 - 1
。康托尔配对函数:
现在Szudzik 的函数:
让我们考虑负整数。 这超出了我知道的最初问题,但只是为了帮助未来的访客而详细阐述。
康托尔配对函数:
Szudzik 的函数:
现在,输出始终为正。 在有符号的世界中,如果我们可以将一半的输出转移到负轴,将会更加节省空间。 对于 Szudzik 来说,您可以这样做:
我所做的:在对输入应用
2
权重并遍历该函数后,我将输出除以二,并将其中一些取负数轴乘以-1
。查看结果,对于有符号
16
位数字范围内的任何输入,输出都在有符号32
位整数的限制内,这很酷。 我不确定如何以相同的方式实现康托配对功能,但没有尝试太多,因为它效率不高。 此外,Cantor配对函数涉及的计算量较多,也意味着其速度较慢。这是一个 C# 实现。
由于中间计算可能超出
2N
有符号整数的限制,因此我使用了4N
整数类型(最后除以2
将结果返回到2N
)。我在替代解决方案上提供的链接很好地描述了利用空间中每个点的函数图。 令人惊讶的是,您可以将一对坐标唯一地可逆地编码为单个数字! 神奇的数字世界!!
Cantor pairing function is really one of the better ones out there considering its simple, fast and space efficient, but there is something even better published at Wolfram by Matthew Szudzik, here. The limitation of Cantor pairing function (relatively) is that the range of encoded results doesn't always stay within the limits of a
2N
bit integer if the inputs are twoN
bit integers. That is, if my inputs are two16
bit integers ranging from0 to 2^16 -1
, then there are2^16 * (2^16 -1)
combinations of inputs possible, so by the obvious Pigeonhole Principle, we need an output of size at least2^16 * (2^16 -1)
, which is equal to2^32 - 2^16
, or in other words, a map of32
bit numbers should be feasible ideally. This may not be of little practical importance in programming world.Cantor pairing function:
Enter Szudzik's function:
Now considering the fact that we typically deal with the signed implementations of numbers of various sizes in languages/frameworks, let's consider
signed 16
bit integers ranging from-(2^15) to 2^15 -1
(later we'll see how to extend even the ouput to span over signed range). Sincea
andb
have to be positive they range from0 to 2^15 - 1
.Cantor pairing function:
Now Szudzik's function:
Let's account for negative integers. That's beyond the original question I know, but just elaborating to help future visitors.
Cantor pairing function:
Szudzik's function:
Now all this while the output has always been positive. In signed world, it will be even more space saving if we could transfer half the output to negative axis. You could do it like this for Szudzik's:
What I do: After applying a weight of
2
to the the inputs and going through the function, I then divide the ouput by two and take some of them to negative axis by multiplying by-1
.See the results, for any input in the range of a signed
16
bit number, the output lies within the limits of a signed32
bit integer which is cool. I'm not sure how to go about the same way for Cantor pairing function but didn't try as much as its not as efficient. Furthermore, more calculations involved in Cantor pairing function means its slower too.Here is a C# implementation.
Since the intermediate calculations can exceed limits of
2N
signed integer, I have used4N
integer type (the last division by2
brings back the result to2N
).The link I have provided on alternate solution nicely depicts a graph of the function utilizing every single point in space. Its amazing to see that you could uniquely encode a pair of coordinates to a single number reversibly! Magic world of numbers!!
您正在寻找双射
NxN -> N 映射。 这些用于例如吻合。 请查看此 PDF,了解所谓的配对功能。 维基百科介绍了一个特定的配对函数,即 Cantor 配对函数:
三点注释:
ZxZ -> N 映射。 康托尔函数仅适用于非负数。 但这不是问题,因为定义双射很容易
f : Z -> N
,如下所示:You're looking for a bijective
NxN -> N
mapping. These are used for e.g. dovetailing. Have a look at this PDF for an introduction to so-called pairing functions. Wikipedia introduces a specific pairing function, namely the Cantor pairing function:Three remarks:
ZxZ -> N
mapping. Cantor's function only works on non-negative numbers. This is not a problem however, because it's easy to define a bijectionf : Z -> N
, like so:如果A和B可以用2个字节表示,则可以将它们组合在4个字节上。 将 A 放在最重要的一半,将 B 放在最不重要的一半。
在 C 语言中,这给出(假设 sizeof(short)=2 和 sizeof(int)=4):
使输入
unsigned Short
或uint16_t
确保它们在之前进行零扩展您将|
或+
它们放在一起。 否则,负数B
会通过 OR 将高位设置为全 1,或者如果进行 ADD,则从上半部分减一。在默认将窄类型提升为有符号
int
后,强制转换(unsigned)A
可以避免左移中的有符号溢出 UB。 对于更广泛的类型,避免移出要保留的位也很重要,例如((uint64_t)A << 32 | B
,因为默认提升在int
处停止(unsigned)B 转换不是必需的;重要的是它是
unsigned Short B
的左侧。 >| 是unsigned
意味着这也将转换为unsigned
您可以将其与有符号类型一起使用,至少是 getA 和 getB,并且您可以返回有符号。来自
combine
的int
,但输入需要零扩展,因此在 C 中,您需要在加宽之前将它们设置为unsigned Short
。 ((unsigned)(unsigned short)A << 16) | (unsigned short)B您可能需要使用
uint16_t
和uint32_t
来定义类型宽度与您正在使用的班次计数相匹配。If A and B can be expressed with 2 bytes, you can combine them on 4 bytes. Put A on the most significant half and B on the least significant half.
In C language this gives (assuming sizeof(short)=2 and sizeof(int)=4):
Making the inputs
unsigned short
oruint16_t
makes sure they zero-extend before you|
or+
them together. Otherwise negativeB
would set the upper bits to all-ones with OR, or subtract one from the upper half if you ADD.Casting
(unsigned)A
avoids signed overflow UB in the left shift after default promotion of narrow types to signedint
. And for for wider types, it's also essential to avoid shifting out bits you to keep, like((uint64_t)A << 32 | B
, since default promotion stops atint
.The
(unsigned)B
cast isn't necessary; the important part is that it wasunsigned short B
to start with. The left hand side of the|
beingunsigned
means this will also convert tounsigned
.You can use this with signed types, at least the getA and getB, and you can return signed
int
fromcombine
, but the inputs need to zero-extend so in C you need them to beunsigned short
before widening. Like((unsigned)(unsigned short)A << 16) | (unsigned short)B
You might want to use
uint16_t
anduint32_t
, to define the type widths to match the shift counts you're using.这可能吗?
您正在组合两个整数。 它们的范围都是 -2,147,483,648 到 2,147,483,647,但您只能取正数。
这使得 2147483647^2 = 4,61169E+18 个组合。
由于每个组合都必须是唯一的并且结果是一个整数,因此您需要某种可以包含这么多数字的神奇整数。
还是我的逻辑有问题?
Is this even possible?
You are combining two integers. They both have the range -2,147,483,648 to 2,147,483,647 but you will only take the positives.
That makes 2147483647^2 = 4,61169E+18 combinations.
Since each combination has to be unique AND result in an integer, you'll need some kind of magical integer that can contain this amount of numbers.
Or is my logic flawed?
正整数的标准数学方法是利用素因数分解的唯一性。
缺点是图像往往跨越相当大的整数范围,因此当涉及到用计算机算法表达映射时,您可能会在为结果选择合适的类型时遇到问题。
您可以修改它,通过对具有 5 和 7 项的幂的标志进行编码来处理负
x
和y
。例如
The standard mathematical way for positive integers is to use the uniqueness of prime factorization.
The downside is that the image tends to span quite a large range of integers so when it comes to expressing the mapping in a computer algorithm you may have issues with choosing an appropriate type for the result.
You could modify this to deal with negative
x
andy
by encoding a flags with powers of 5 and 7 terms.e.g.
设数字
a
为第一个,b
为第二个。 令p
为第a+1
个素数,q
为第b+1
个素数那么,如果
a,结果为
pq
,如果a>b
,结果为2pq
。 如果a=b
,则设为p^2
。Let number
a
be the first,b
the second. Letp
be thea+1
-th prime number,q
be theb+1
-th prime numberThen, the result is
pq
, ifa<b,
or2pq
ifa>b
. Ifa=b
, let it bep^2
.尽管 Stephan202 的答案是唯一真正通用的答案,但对于有界范围内的整数,您可以做得更好。 例如,如果您的范围是 0..10,000,那么您可以执行以下操作:
结果可以适合范围最大为整数类型基数平方根的单个整数。 这种打包方式比 Stephan202 更通用的方法稍微高效一些。 解码起来也相当简单; 对于初学者来说不需要平方根:)
Although Stephan202's answer is the only truly general one, for integers in a bounded range you can do better. For example, if your range is 0..10,000, then you can do:
Results can fit in a single integer for a range up to the square root of the integer type's cardinality. This packs slightly more efficiently than Stephan202's more general method. It is also considerably simpler to decode; requiring no square roots, for starters :)
对于正整数作为参数且参数顺序无关紧要:
这是一个 无序配对函数:
对于 x ≠ y,这里有一个 独特的无序配对功能:
For positive integers as arguments and where argument order doesn't matter:
Here's an unordered pairing function:
For x ≠ y, here's a unique unordered pairing function:
f(a, b) = s(a+b) + a
,其中s(n) = n*(n+1)/2
这是使用以下事实: s(a+b+1)-s(a+b) = a+b+1
< 一个。
我不明白你的意思:
如何在这个论坛中写(大于)、(小于)字符?
f(a, b) = s(a+b) + a
, wheres(n) = n*(n+1)/2
this using the fact:
s(a+b+1)-s(a+b) = a+b+1
.< a
I did not understand what You mean by:
How can I write (greater than), (less than) characters in this forum?
这是基于 @nawfal 给出的方法将 @DoctorJ 的代码扩展到无界整数。 它可以编码和解码。 它适用于普通数组和 numpy 数组。
Here is an extension of @DoctorJ 's code to unbounded integers based on the method given by @nawfal. It can encode and decode. It works with normal arrays and numpy arrays.
检查一下:http://en.wikipedia.org/wiki/Pigeonhole_principle。 如果A、B、C是同一类型,则不能这样做。 如果A和B是16位整数,而C是32位整数,那么你可以简单地使用移位。
哈希算法的本质是它们无法为每个不同的输入提供唯一的哈希值。
Check this: http://en.wikipedia.org/wiki/Pigeonhole_principle. If A, B and C are of same type, it cannot be done. If A and B are 16-bit integers, and C is 32-bit, then you can simply use shifting.
The very nature of hashing algorithms is that they cannot provide a unique hash for each different input.
构建映射并不难:
弄清楚如何获取任意 a、b 的值要困难一些。
It isn't that tough to construct a mapping:
Figuring out how to get the value for an arbitrary a,b is a little more difficult.
让我们有两个数字 B 和 C ,将它们编码为单个数字 A
A = B + C * N
其中
B=A % N = B
C=A / N = C
let us have two number B and C , encoding them into single number A
A = B + C * N
where
B=A % N = B
C=A / N = C
你的建议是不可能的。 你们总会有冲突。
为了将两个对象映射到另一个集合,映射集合的最小大小必须达到预期的组合数量:
假设是 32 位整数,则有 2147483647 个正整数。 选择其中两个(顺序无关紧要且重复)会产生 2305843008139952128 个组合。 这不太适合 32 位整数集合。
不过,您可以将此映射装入 61 位。 使用 64 位整数可能是最简单的。 将高位字设置为较小的整数,将低位字设置为较大的整数。
What you suggest is impossible. You will always have collisions.
In order to map two objects to another single set, the mapped set must have a minimum size of the number of combinations expected:
Assuming a 32-bit integer, you have 2147483647 positive integers. Choosing two of these where order doesn't matter and with repetition yields 2305843008139952128 combinations. This does not fit nicely in the set of 32-bit integers.
You can, however fit this mapping in 61 bits. Using a 64-bit integer is probably easiest. Set the high word to the smaller integer and the low word to the larger one.
更简单的事情怎么样:给定两个数字,A 和 B 让 str 成为串联:'A' + ';' +“B”。 然后让输出为hash(str)。 我知道这不是一个数学答案,但一个简单的 python(有一个内置的哈希函数)脚本应该可以完成这项工作。
How about something much simpler: Given two numbers, A and B let str be the concatenation: 'A' + ';' + 'B'. Then let the output be hash(str). I know that this is not a mathematical answer, but a simple python (which has an in built hash function) script should do the job.
假设您有一个 32 位整数,为什么不将 A 移至前 16 位一半,将 B 移至另一半呢?
除了尽可能节省空间和计算成本之外,一个非常酷的副作用是您可以对打包的数字进行向量数学运算。
Say you have a 32 bit integer, why not just move A into the first 16 bit half and B into the other?
Other than this being as space efficient as possible and cheap to compute, a really cool side effect is that you can do vector math on the packed number.
如果您想要更多控制,例如为第一个数字分配 X 位,为第二个数字分配 Y 位,您可以使用以下代码:
我总共使用 32 位。 这里的想法是,例如,如果您希望第一个数字最多为 10 位,第二个数字最多为 12 位,您可以这样做:
现在您可以在
num_a
中存储最大数字即2^10 - 1 = 1023
,在num_b
中,最大值为2^12 - 1 = 4095
。设置 num A 和 num B 的值:
现在
bnum
是所有位(总共 32 位。您可以修改代码以使用 64 位)获取数字 a:
获取数字 b:
编辑:
bnum
可以存储在类内部。 我不是因为我自己的需要才这么做的我分享了代码,希望对您有所帮助。
感谢您提供来源:
https://www.geeksforgeeks.org/extract-k-bits -给定位置编号/
用于提取位的函数,也感谢本文中的
mouviciel
回答。使用这些资源我可以找出更高级的解决方案
If you want more control such as allocate X bits for the first number and Y bits for the second number, you can use this code:
I use 32 bits in total. The idea here is that if you want for example that first number will be up to 10 bits and second number will be up to 12 bits, you can do this:
Now you can store in
num_a
the maximum number that is2^10 - 1 = 1023
and innum_b
naximum value of2^12 - 1 = 4095
.To set value for num A and num B:
Now
bnum
is all of the bits (32 bits in total. You can modify the code to use 64 bits)To get num a:
To get num b:
EDIT:
bnum
can be stored inside the class. I did not did it because my own needsI shared the code and hope that it will be helpful.
Thanks for source:
https://www.geeksforgeeks.org/extract-k-bits-given-position-number/
for function to extract bits and thanks also to
mouviciel
answer in this post.Using these to sources I could figure out more advanced solution
我们可以在 O(1) 空间和 O(N) 时间内将两个数字编码为一个。
假设您要将 0-9 范围内的数字编码为 1,例如: 5、6.怎么做? 很简单,
即使对于两位整数,比如 56 和 45,56*100 + 45 = 5645。我们可以通过执行 5645/100 和 5645%100 再次获得单个数字,
但是对于大小为 n 的数组,例如: a = {4,0,2,1,3},假设我们要对 3 和 4 进行编码,因此:
推广后,我们得到
但是我们还需要注意我们更改的值; 所以它最终是
你可以通过除法并找到结果数字的模来单独获得每个数字。
We can encode two numbers into one in O(1) space and O(N) time.
Suppose you want to encode numbers in the range 0-9 into one, eg. 5 and 6. How to do it? Simple,
Even for two digit integer let's say 56 and 45, 56*100 + 45 = 5645. We can again obtain individual numbers by doing 5645/100 and 5645%100
But for an array of size n, eg. a = {4,0,2,1,3}, let's say we want to encode 3 and 4, so:
Upon generalising it, we get
But we also need to take care of the value we changed; so it ends up as
You can obtain each number individually by dividing and finding mod of the resultant number.
对于新手,我想通过指出以下参考文献(那里的公式 2.1)来补充 @nawfal 的精彩答案:
The Rosenberg-Strong Pairing Function
数学公式如下:给定 a, b 两个非负整数,我们感兴趣的是函数 f,使得 f(a, b) = f(a', b') == > a=a',b=b'。
上述参考文献提供了 Rosenberg-Strong 函数 $f$ 定义如下:
f(a, b) = (max(a,b))^2 + max(a, b) + a - b
这看起来比 Cantor 的更经济和 Szudzik 的(查看参考文献中的图 2 以了解编码的工作原理!枚举看起来是最佳的)。
请注意,Szudzik 撰写了这篇论文,但并未将他自己的优雅解决方案放入本文中。
For newcomers, I would like to add to @nawfal's brilliant answer by pointing to the following reference (equation 2.1 there):
The Rosenberg-Strong Pairing Function
The mathematical formulation is the following: given a, b two nonnegative integers, we are interested in a function f such that f(a, b) = f(a', b') ==> a=a', b=b'.
The above references provide a Rosenberg-Strong function $f$ defined as follows:
f(a, b) = (max(a,b))^2 + max(a, b) + a - b
This looks more economical than Cantor's and Szudzik's (check out Figure 2 in the reference to see how the encoding works! The enumeration looks optimal).
Note that Szudzik wrote this paper but did not put his own elegant solution in this paper.
给定正整数 A 和 B,令 D = A 的位数,E = B 的位数
结果可以是 D、0、E、0、A 和 B 的串联。
示例:A = 300,B = 12。D = 3,E=2 结果 = 302030012。
这利用了唯一以 0 开头的数字是 0 的事实,
优点:易于编码,易于解码,人类可读,可以首先比较有效数字,无需计算即可进行比较的潜力,简单的错误检查。
缺点:结果的大小是一个问题。 但这没关系,为什么我们要在计算机中存储无界整数呢?
Given positive integers A and B, let D = number of digits A has, and E=number of digits B has
The result can be a concatenation of D, 0, E, 0, A, and B.
Example: A = 300, B = 12. D = 3, E=2 result = 302030012.
This takes advantage of the fact that the only number that starts with 0, is 0,
Pro: Easy to encode, easy to decode, human readable, significant digits can be compared first, potential for compare without calculation, simple error checking.
Cons: Size of results is an issue. But that's ok, why are we storing unbounded integers in a computer anyways.
要将两个任意大小的非负整数合并为一个,只需将它们的数字交错即可。
示例:12345 和 678:
结果是 1020364758。
您可以使用任何基数(例如基数 2 或基数 16)执行此操作。 结果最多是最大数字的两倍,这意味着例如组合两个 16 位数字将始终适合 32 位。
如果我们还想允许负整数,我们可以通过将绝对值乘以 2 来编码其中之一或两者,然后如果数字为负则加 1。 解码时,模 2 将表示符号。
我们可以将此方法扩展到小数,例如,如果我们想要组合 1234.5 和 6.78:
结果是 10203046.5708。 (这在编程中可能不是很有用,除非您使用定点数。)
请注意我们如何利用整数部分中的前导零无关紧要,就像小数部分中的尾随零无关紧要一样。
To combine two non-negative integers of any size into one, you can simply interleave their digits.
Example: 12345 and 678:
The result is 1020364758.
You can do this with any base (e.g. base 2 or base 16). The result will be at most twice as big as the biggest number, which means that e.g. combining two 16-bit numbers will always fit in 32-bit.
If we wanted to also allow negative integers, we could encode one or both by multiplying the absolute value by 2, and then adding 1 if the number was negative. When decoding, the modulus of 2 would then indicate the sign.
We can extend this approach to decimals, for example if we wanted to combine 1234.5 and 6.78:
The result is 10203046.5708. (This is probably not very useful in programming, unless you use fixed-point numbers.)
Note how we take advantage that leading zeroes are insignificant in the integer part in the same way that trailing zeroes are insignificant in the fractional part.