当前位置：文江博客话题详情

algorithm vhdl modulo

实现模运算的更好方法（算法问题）

发布于 2024-08-31 13:10:19 字数 757 浏览 7 评论 0 原文

我最近一直在尝试实现一个模幂器。我正在用 VHDL 编写代码，但我正在寻找更具算法性质的建议。模幂器的主要组件是模乘法器，我也必须自己实现。我对乘法算法没有任何问题 - 它只是加法和移位，并且我很好地弄清楚了所有变量的含义，以便我可以在相当合理的时间内进行乘法。

我遇到的问题是在乘法器中实现模运算。我知道重复执行减法会起作用，但也会很慢。我发现我可以移动模数以有效地减去模数的大倍数，但我认为可能仍然有更好的方法来做到这一点。我正在使用的算法的工作原理如下（下面是奇怪的伪代码）：

result,modulus : integer (n bits) (previously defined)
shiftcount : integer (initialized to zero)
while( (modulus<result) and  (modulus(n-1) != 1) ){
     modulus = modulus << 1
     shiftcount++
}
for(i=shiftcount;i>=0;i--){
     if(modulus<result){result = result-modulus}
     if(i!=0){modulus = modulus >> 1}
}

那么......这是一个好的算法，或者至少是一个好的起点？维基百科并没有真正讨论实现模运算的算法，每当我尝试在其他地方搜索时，我都会发现非常有趣但极其复杂（并且通常不相关）的研究论文和出版物。如果有一种我没有看到的明显的方法来实现这一点，我真的很感激一些反馈。

原文

I've been trying to implement a modular exponentiator recently. I'm writing the code in VHDL, but I'm looking for advice of a more algorithmic nature. The main component of the modular exponentiator is a modular multiplier which I also have to implement myself. I haven't had any problems with the multiplication algorithm- it's just adding and shifting and I've done a good job of figuring out what all of my variables mean so that I can multiply in a pretty reasonable amount of time.

The problem that I'm having is with implementing the modulus operation in the multiplier. I know that performing repeated subtractions will work, but it will also be slow. I found out that I could shift the modulus to effectively subtract large multiples of the modulus but I think there might still be better ways to do this. The algorithm that I'm using works something like this (weird pseudocode follows):

result,modulus : integer (n bits) (previously defined)
shiftcount : integer (initialized to zero)
while( (modulus<result) and  (modulus(n-1) != 1) ){
     modulus = modulus << 1
     shiftcount++
}
for(i=shiftcount;i>=0;i--){
     if(modulus<result){result = result-modulus}
     if(i!=0){modulus = modulus >> 1}
}

So...is this a good algorithm, or at least a good place to start? Wikipedia doesn't really discuss algorithms for implementing the modulo operation, and whenever I try to search elsewhere I find really interesting but incredibly complicated (and often unrelated) research papers and publications. If there's an obvious way to implement this that I'm not seeing, I'd really appreciate some feedback.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

北城孤痞 2024-09-07 13:10:20

老实说，我不确定你在计算什么。您谈论的是模运算，但通常模运算是在两个数字 a 和 b 之间进行的，其结果是 a 除以 a 的余数b。你的伪代码中的a和b在哪里......？

无论如何，也许这会有所帮助：a mod b = a - Floor(a / b) * b。

我不知道这是否更快，这取决于你是否可以比大量减法更快地进行除法和乘法。

加速减法方法的另一种方法是使用二分搜索。如果你想要a mod b，你需要从a减去b，直到a小于 b. 。所以基本上你需要找到k这样：

a - k*b a - k*b a - k*b k b，k 是 min

找到这个 k 的一种方法是线性搜索：

k = 0;
while ( a - k*b >= b )
    ++k;

return a - k*b;

但您也可以对它进行二分搜索（只运行了一些测试，但它对所有测试都有效）：

k = 0;
left = 0, right = a
while ( left < right )
{
    m = (left + right) / 2;
    if ( a - m*b >= b )
       left = m + 1;
    else
       right = m;
}

return a - left*b;

我我猜测在处理大数字时，二分搜索解决方案将是最快的。

如果您想计算 a mod b 并且只有 a 是一个大数（您可以将 b 存储在原始数据类型上），您可以做得更快：

for each digit p of a do
    mod = (mod * 10 + p) % b
return mod

这有效，因为我们可以将 a 写为 a_n*10^n + a_(n-1)*10^(n-1) + ... + a_1* 10^0 = (((a_n * 10 + a_(n-1)) * 10 + a_(n-2)) * 10 + ...

我认为二分搜索就是你要找的尽管。

I'm not sure what you're calculating there to be honest. You talk about modulo operation, but usually a modulo operation is between two numbers a and b, and its result is the remainder of dividing a by b. Where is the a and b in your pseudocode...?

Anyway, maybe this'll help: a mod b = a - floor(a / b) * b.

I don't know if this is faster or not, it depends on whether or not you can do division and multiplication faster than a lot of subtractions.

Another way to speed up the subtraction approach is to use binary search. If you want a mod b, you need to subtract b from a until a is smaller than b. So basically you need to find k such that:

a - k*b < b, k is min

One way to find this k is a linear search:

k = 0;
while ( a - k*b >= b )
    ++k;

return a - k*b;

But you can also binary search it (only ran a few tests but it worked on all of them):

k = 0;
left = 0, right = a
while ( left < right )
{
    m = (left + right) / 2;
    if ( a - m*b >= b )
       left = m + 1;
    else
       right = m;
}

return a - left*b;

I'm guessing the binary search solution will be the fastest when dealing with big numbers.

If you want to calculate a mod b and only a is a big number (you can store b on a primitive data type), you can do it even faster:

for each digit p of a do
    mod = (mod * 10 + p) % b
return mod

This works because we can write a as a_n*10^n + a_(n-1)*10^(n-1) + ... + a_1*10^0 = (((a_n * 10 + a_(n-1)) * 10 + a_(n-2)) * 10 + ...

I think the binary search is what you're looking for though.

回复收藏 0 原文

笔落惊风雨 2024-09-07 13:10:20

有很多方法可以在 O(log n) 时间内完成 n 位；您可以通过乘法来完成此操作，并且不必一次迭代 1 位。例如，

a mod b = a - floor((a * r)/2^n) * b

where

r = 2^n / b

是预先计算的，因为通常您会多次使用相同的 b。如果不是，则使用标准超收敛多项式迭代方法求倒数（以定点迭代 2x - bx^2）。

根据您需要结果的范围选择n（对于模幂等许多算法，它不必是0..b）。

（几十年前，我以为我看到了一个避免连续两次乘法的技巧......更新：我认为它是蒙哥马利乘法（请参阅 REDC 算法），REDC 的工作与上面的更简单的算法相同。不知道为什么要发明 REDC……可能由于使用低阶而延迟稍低。结果进入链式乘法，而不是高阶结果？）

当然，如果你有很多内存，你可以预先计算 n 的所有 2^n mod b 部分和= log2(b)..log2(a)。许多软件实现都这样做。

There are many ways to do it in O(log n) time for n bits; you can do it with multiplication and you don't have to iterate 1 bit at a time. For example,

a mod b = a - floor((a * r)/2^n) * b

where

r = 2^n / b

is precomputed because typically you're using the same b many times. If not, use the standard superconverging polynomial iteration method for reciprocal (iterate 2x - bx^2 in fixed point).

Choose n according to the range you need the result (for many algorithms like modulo exponentiation it doesn't have to be 0..b).

(Many decades ago I thought I saw a trick to avoid 2 multiplications in a row... Update: I think it's Montgomery Multiplication (see REDC algorithm). I take it back, REDC does the same work as the simpler algorithm above. Not sure why REDC was ever invented... Maybe slightly lower latency due to using the low-order result into the chained multiplication, instead of the higher-order result?)

Of course if you have a lot of memory, you can just precompute all the 2^n mod b partial sums for n = log2(b)..log2(a). Many software implementations do this.

回复收藏 0 原文

匿名。 2024-09-07 13:10:20

如果您使用移位加法进行乘法（这绝不是最快的方法），您可以在每个加法步骤之后进行模运算。如果总和大于模数，则减去模数。如果能预测溢出，就可以同时进行加法和减法。在每一步进行模运算也会减少乘法器的整体大小（与输入长度相同，而不是加倍）。

您所做的模数的转换可以让您在很大程度上实现全除法算法（模数只是取余数）。

编辑这是我在Python中的实现：

def mod_mul(a,b,m):
    result = 0
    a = a % m
    b = b % m
    while (b>0):
        if (b&1)!=0:
            result += a
            if result >= m: result -= m
        a = a << 1
        if a>=m: a-= m
        b = b>>1
    return result

这只是模乘法（result = a*b mod m）。顶部的模运算不是必需的，但可以提醒您算法假设 a 和 b 小于 m。

当然，对于模幂运算，您将有一个外循环，它在每个步骤中执行整个操作，进行平方或乘法。但我想你知道这一点。

If you're using shift-and-add for the multiplication (which is by no means the fastest way) you can do the modulo operation after each addition step. If the sum is greater than the modulus you then subtract the modulus. If you can predict the overflow, you can do the addition and subtraction at the same time. Doing the modulo at each step will also reduce the overall size of your multiplier (same length as input rather than double).

The shifting of the modulus you're doing is getting you most of the way towards a full division algorithm (modulo is just taking the remainder).

EDIT Here is my implementation in Python:

def mod_mul(a,b,m):
    result = 0
    a = a % m
    b = b % m
    while (b>0):
        if (b&1)!=0:
            result += a
            if result >= m: result -= m
        a = a << 1
        if a>=m: a-= m
        b = b>>1
    return result

This is just modular multiplication (result = a*b mod m). The modulo operations at the top are not needed, but serve as a reminder that the algorithm assumes a and b are less than m.

Of course for modular exponentiation you'll have an outer loop that does this entire operation at each step doing either squaring or multiplication. But I think you knew that.

回复收藏 0 原文

原野 2024-09-07 13:10:20

对于模本身，我不确定。对于作为较大模指数运算一部分的模，您是否查找了蒙哥马利乘法，如中所述维基百科页面上的模幂？我已经有一段时间没有研究这种类型的算法了，但据我记得，它通常用于快速模幂运算。

编辑：就其价值而言，你的模算法乍一看似乎还不错。您基本上是在进行除法，这是一种重复的减法算法。

回复收藏 0 原文

嘴硬脾气大 2024-09-07 13:10:20

该测试 (modulus(n-1) != 1) // 一点测试？

- 与(modulus结合起来似乎是多余的。

为硬件实现进行设计时，我会意识到比测试更小/更大意味着比按位运算和零分支更多的逻辑（减法）。

如果我们可以轻松地进行按位测试，这可能会很快：（

m=msb_of(modulus)

while( result>0 ) 
{
  r=msb_of(result) //countdown from prev msb onto result
  shift=r-m        //countdown from r onto modulus or 
                   //unroll the small subtraction 

  takeoff=(modulus<<(shift))  //or integrate this into count of shift

  result=result-takeoff;  //necessary subtraction

  if(shift!=0 && result<0)
  { result=result+(takeoff>>1); }

  } //endwhile

if(result==0) { return result }
else          { return result+takeoff }

未经测试的代码可能包含陷阱）

结果通过模重复递减，移位以匹配最高有效位。

每次减法后：结果 有约 50/50 的机会丢失超过 1 msb。它也有大约 50/50 的几率变为负值，
加上减去的一半总是会再次变为正值。 >如果 shift 不=0，则应将其放回正值。

当 result 运行不足且“shift”为 0 时，工作循环退出。

That test (modulus(n-1) != 1) //a bit test?

-seems redundant combined with (modulus<result).

Designing for hardware implementation i would be conscious of the smaller/greater than tests implying more logic (subtraction) than bitwise operations and branching on zero.

If we can do bitwise tests easily, this could be quick:

m=msb_of(modulus)

while( result>0 ) 
{
  r=msb_of(result) //countdown from prev msb onto result
  shift=r-m        //countdown from r onto modulus or 
                   //unroll the small subtraction 

  takeoff=(modulus<<(shift))  //or integrate this into count of shift

  result=result-takeoff;  //necessary subtraction

  if(shift!=0 && result<0)
  { result=result+(takeoff>>1); }

  } //endwhile

if(result==0) { return result }
else          { return result+takeoff }

(code untested may contain gotchas)

result is repetively decremented by modulus shifted to match at most significant bits.

After each subtraction: result has a ~50/50 chance of loosing more than 1 msb. It also has ~50/50 chance of going negative,
addition of half what was subtracted will always put it into positive again. > it should be put back in positive if shift was not=0

The working loop exits when result is underrun and 'shift' was 0.

回复收藏 0 原文

~没有更多了~