我最近一直在尝试实现一个模幂器。我正在用 VHDL 编写代码,但我正在寻找更具算法性质的建议。模幂器的主要组件是模乘法器,我也必须自己实现。我对乘法算法没有任何问题 - 它只是加法和移位,并且我很好地弄清楚了所有变量的含义,以便我可以在相当合理的时间内进行乘法。
我遇到的问题是在乘法器中实现模运算。我知道重复执行减法会起作用,但也会很慢。我发现我可以移动模数以有效地减去模数的大倍数,但我认为可能仍然有更好的方法来做到这一点。我正在使用的算法的工作原理如下(下面是奇怪的伪代码):
result,modulus : integer (n bits) (previously defined)
shiftcount : integer (initialized to zero)
while( (modulus<result) and (modulus(n-1) != 1) ){
modulus = modulus << 1
shiftcount++
}
for(i=shiftcount;i>=0;i--){
if(modulus<result){result = result-modulus}
if(i!=0){modulus = modulus >> 1}
}
那么......这是一个好的算法,或者至少是一个好的起点?维基百科并没有真正讨论实现模运算的算法,每当我尝试在其他地方搜索时,我都会发现非常有趣但极其复杂(并且通常不相关)的研究论文和出版物。如果有一种我没有看到的明显的方法来实现这一点,我真的很感激一些反馈。
I've been trying to implement a modular exponentiator recently. I'm writing the code in VHDL, but I'm looking for advice of a more algorithmic nature. The main component of the modular exponentiator is a modular multiplier which I also have to implement myself. I haven't had any problems with the multiplication algorithm- it's just adding and shifting and I've done a good job of figuring out what all of my variables mean so that I can multiply in a pretty reasonable amount of time.
The problem that I'm having is with implementing the modulus operation in the multiplier. I know that performing repeated subtractions will work, but it will also be slow. I found out that I could shift the modulus to effectively subtract large multiples of the modulus but I think there might still be better ways to do this. The algorithm that I'm using works something like this (weird pseudocode follows):
result,modulus : integer (n bits) (previously defined)
shiftcount : integer (initialized to zero)
while( (modulus<result) and (modulus(n-1) != 1) ){
modulus = modulus << 1
shiftcount++
}
for(i=shiftcount;i>=0;i--){
if(modulus<result){result = result-modulus}
if(i!=0){modulus = modulus >> 1}
}
So...is this a good algorithm, or at least a good place to start? Wikipedia doesn't really discuss algorithms for implementing the modulo operation, and whenever I try to search elsewhere I find really interesting but incredibly complicated (and often unrelated) research papers and publications. If there's an obvious way to implement this that I'm not seeing, I'd really appreciate some feedback.
发布评论
评论(5)
老实说,我不确定你在计算什么。您谈论的是模运算,但通常模运算是在两个数字
a
和b
之间进行的,其结果是a
除以a
的余数b
。你的伪代码中的a
和b
在哪里......?无论如何,也许这会有所帮助:
a mod b = a - Floor(a / b) * b
。我不知道这是否更快,这取决于你是否可以比大量减法更快地进行除法和乘法。
加速减法方法的另一种方法是使用二分搜索。如果你想要
a mod b
,你需要从a
减去b
,直到a
小于b.
。所以基本上你需要找到k
这样:a - k*b
a - k*b
a - k*b
k
b,k 是 min找到这个
k
的一种方法是线性搜索:但您也可以对它进行二分搜索(只运行了一些测试,但它对所有测试都有效):
我我猜测在处理大数字时,二分搜索解决方案将是最快的。
如果您想计算
a mod b
并且只有a
是一个大数(您可以将b
存储在原始数据类型上),您可以做得更快:这有效,因为我们可以将
a
写为a_n*10^n + a_(n-1)*10^(n-1) + ... + a_1* 10^0 = (((a_n * 10 + a_(n-1)) * 10 + a_(n-2)) * 10 + ...
我认为二分搜索就是你要找的尽管。
I'm not sure what you're calculating there to be honest. You talk about modulo operation, but usually a modulo operation is between two numbers
a
andb
, and its result is the remainder of dividinga
byb
. Where is thea
andb
in your pseudocode...?Anyway, maybe this'll help:
a mod b = a - floor(a / b) * b
.I don't know if this is faster or not, it depends on whether or not you can do division and multiplication faster than a lot of subtractions.
Another way to speed up the subtraction approach is to use binary search. If you want
a mod b
, you need to subtractb
froma
untila
is smaller thanb
. So basically you need to findk
such that:a - k*b < b, k is min
One way to find this
k
is a linear search:But you can also binary search it (only ran a few tests but it worked on all of them):
I'm guessing the binary search solution will be the fastest when dealing with big numbers.
If you want to calculate
a mod b
and onlya
is a big number (you can storeb
on a primitive data type), you can do it even faster:This works because we can write
a
asa_n*10^n + a_(n-1)*10^(n-1) + ... + a_1*10^0 = (((a_n * 10 + a_(n-1)) * 10 + a_(n-2)) * 10 + ...
I think the binary search is what you're looking for though.
有很多方法可以在 O(log n) 时间内完成 n 位;您可以通过乘法来完成此操作,并且不必一次迭代 1 位。例如,
where
是预先计算的,因为通常您会多次使用相同的
b
。如果不是,则使用标准超收敛多项式迭代方法求倒数(以定点迭代2x - bx^2
)。根据您需要结果的范围选择
n
(对于模幂等许多算法,它不必是0..b
)。(几十年前,我以为我看到了一个避免连续两次乘法的技巧......更新:我认为它是 蒙哥马利乘法(请参阅 REDC 算法),REDC 的工作与上面的更简单的算法相同。不知道为什么要发明 REDC……可能由于使用低阶而延迟稍低。结果进入链式乘法,而不是高阶结果?)
当然,如果你有很多内存,你可以预先计算
n 的所有
。许多软件实现都这样做。2^n mod b
部分和= log2(b)..log2(a)There are many ways to do it in O(log n) time for n bits; you can do it with multiplication and you don't have to iterate 1 bit at a time. For example,
where
is precomputed because typically you're using the same
b
many times. If not, use the standard superconverging polynomial iteration method for reciprocal (iterate2x - bx^2
in fixed point).Choose
n
according to the range you need the result (for many algorithms like modulo exponentiation it doesn't have to be0..b
).(Many decades ago I thought I saw a trick to avoid 2 multiplications in a row... Update: I think it's Montgomery Multiplication (see REDC algorithm). I take it back, REDC does the same work as the simpler algorithm above. Not sure why REDC was ever invented... Maybe slightly lower latency due to using the low-order result into the chained multiplication, instead of the higher-order result?)
Of course if you have a lot of memory, you can just precompute all the
2^n mod b
partial sums forn = log2(b)..log2(a)
. Many software implementations do this.如果您使用移位加法进行乘法(这绝不是最快的方法),您可以在每个加法步骤之后进行模运算。如果总和大于模数,则减去模数。如果能预测溢出,就可以同时进行加法和减法。在每一步进行模运算也会减少乘法器的整体大小(与输入长度相同,而不是加倍)。
您所做的模数的转换可以让您在很大程度上实现全除法算法(模数只是取余数)。
编辑这是我在Python中的实现:
这只是模乘法(
result = a*b mod m
)。顶部的模运算不是必需的,但可以提醒您算法假设a
和b
小于m
。当然,对于模幂运算,您将有一个外循环,它在每个步骤中执行整个操作,进行平方或乘法。但我想你知道这一点。
If you're using shift-and-add for the multiplication (which is by no means the fastest way) you can do the modulo operation after each addition step. If the sum is greater than the modulus you then subtract the modulus. If you can predict the overflow, you can do the addition and subtraction at the same time. Doing the modulo at each step will also reduce the overall size of your multiplier (same length as input rather than double).
The shifting of the modulus you're doing is getting you most of the way towards a full division algorithm (modulo is just taking the remainder).
EDIT Here is my implementation in Python:
This is just modular multiplication (
result = a*b mod m
). The modulo operations at the top are not needed, but serve as a reminder that the algorithm assumesa
andb
are less thanm
.Of course for modular exponentiation you'll have an outer loop that does this entire operation at each step doing either squaring or multiplication. But I think you knew that.
对于模本身,我不确定。对于作为较大模指数运算一部分的模,您是否查找了蒙哥马利乘法,如中所述维基百科页面上的模幂?我已经有一段时间没有研究这种类型的算法了,但据我记得,它通常用于快速模幂运算。
编辑:就其价值而言,你的模算法乍一看似乎还不错。您基本上是在进行除法,这是一种重复的减法算法。
For modulo itself, I'm not sure. For modulo as part of the larger modular exponential operation, did you look up Montgomery multiplication as mentioned in the wikipedia page on modular exponentiation? It's been a while since I've looked into this type of algorithm, but from what I recall, it's commonly used in fast modular exponentiation.
edit: for what it's worth, your modulo algorithm seems ok at first glance. You're basically doing division which is a repeated subtraction algorithm.
该测试
(modulus(n-1) != 1)
// 一点测试?- 与
(modulus结合起来似乎是多余的。
为硬件实现进行设计时,我会意识到比测试更小/更大意味着比按位运算和零分支更多的逻辑(减法)。
如果我们可以轻松地进行按位测试,这可能会很快:(
未经测试的代码可能包含陷阱)
结果
通过模
重复递减,移位以匹配最高有效位。每次减法后:
结果
有约 50/50 的机会丢失超过 1 msb。它也有大约 50/50 的几率变为负值,加上减去的一半总是会再次变为正值。 >如果 shift 不=0,则应将其放回正值。
当
result
运行不足且“shift”为 0 时,工作循环退出。That test
(modulus(n-1) != 1)
//a bit test?-seems redundant combined with
(modulus<result)
.Designing for hardware implementation i would be conscious of the smaller/greater than tests implying more logic (subtraction) than bitwise operations and branching on zero.
If we can do bitwise tests easily, this could be quick:
(code untested may contain gotchas)
result
is repetively decremented bymodulus
shifted to match at most significant bits.After each subtraction:
result
has a ~50/50 chance of loosing more than 1 msb. It also has ~50/50 chance of going negative,addition of half what was subtracted will always put it into positive again. > it should be put back in positive if shift was not=0
The working loop exits when
result
is underrun and 'shift' was 0.