在VHDL中处理非常大的矢量

发布于 2025-02-12 16:34:17 字数 1409 浏览 2 评论 0原文

类似地向问的问题在这里，并且在上一个主题中没有任何帮助。我有一个2^15位宽的样本，我想通过移动该向量的所有位来进行类似于该样品的位自相关的东西。

我的限制之一是快速出现时间（以12MHz时钟的速度小于100ms）。

我暂时这样做的方法是使用一个FSM，其中一个状态负责处理位自相关的每种迭代，以及以下步骤将当前结果与迄今为止最小值进行比较（最小值（反映了反映的基本频率）样本）。为此，我使用循环（通常不是这种结构的忠实拥护者...）。这是我的代码的一部分：

when S_BACF => 
    count_ones :=(others=>'0');
    for i in 0 to 32766 loop
        count_ones := count_ones + (sample(i) xor sample((index+i) mod 32767));
    end loop;
    s_nb_of_ones_current <= count_ones;
    current_state <= S_COMP;                      
when S_COMP => 
    if index >= 19999 then
        index <=0;
        current_state <= S_END;
    else
        index <= index+1;
        current_state <= S_BACF;
        if s_nb_of_ones_saved > s_nb_of_ones_current then
            s_nb_of_ones_saved <= s_nb_of_ones_current;
            s_min_rank <= std_logic_vector(to_unsigned(index,15));
        end if;
    end if;

对不起，我没有定义每个信号/变量，但我认为它足够透明。我的索引不需要超过20000（对于此应用程序）。

我认为，这种处理数据的方式在寄存器使用中非常有效（“仅” 2^15位的一个向量和15位向量的基本频率）。

在模拟中，它效果很好。但是，正如预期的那样，即使对于相当大的目标，综合也会失败。 for循环，尽管理论上有效，但不能以如此大的深度综合。

我想象将我的数据分成几个较小的零件，但是通过不同较小的向量进行的固定操作是一场噩梦。

我还考虑使用使用FPGA的内部RAM来保存样品，以减少寄存器的使用情况，但这对For循环综合问题没有效。

那么，你们中的任何人都有一个好主意，可以成功地综合吗？

原文

Similiarly to the question asked here, I face issues managing very large vectors in FPGA, and nothing really helped in the previous topic. I have a 2^15 bits wide sample, I want to make something similar to an bitwise autocorrelation of this sample by shifting-xoring-adding all the bits of this vector.

One of my constraints is a fast exectution time (less than 100ms with a 12MHz clock).

My way to do it for now is using an FSM in which one state is responsible for processing each iteration of the bitwise autocorrelation, and a following step comparing the current result to the minimum value so far (the minimum value reflecting the fundamental frequency of the sample). To do so, I use a for loop (not a big fan of this kind of structure usually...). Here is a piece of my code :

when S_BACF => 
    count_ones :=(others=>'0');
    for i in 0 to 32766 loop
        count_ones := count_ones + (sample(i) xor sample((index+i) mod 32767));
    end loop;
    s_nb_of_ones_current <= count_ones;
    current_state <= S_COMP;                      
when S_COMP => 
    if index >= 19999 then
        index <=0;
        current_state <= S_END;
    else
        index <= index+1;
        current_state <= S_BACF;
        if s_nb_of_ones_saved > s_nb_of_ones_current then
            s_nb_of_ones_saved <= s_nb_of_ones_current;
            s_min_rank <= std_logic_vector(to_unsigned(index,15));
        end if;
    end if;

Sorry I don't define each signal/variable, but I think it is transparent enough. My index doesn't need to go beyond 20000 (for this application).

I think this way of processing data is quite efficient in registers use ("just" one vector of 2^15 bits and a 15-bits vector for the fundamental frequency).

In simulation, it works great. BUT, as expected, the synthesis fails, even for quite big targets. The for loop, though efficient in theory cannot be synthesised with such a big depth.

I imagined splitting my data into several smaller pieces, but the shifting-xoring operation through different smaller vectors is a nightmare.

I also thought about using the internal RAM of my FPGA to save my sample, in order to reduce the register usage, but it won't be effective against the for loop synthesis issue.

So, do any of you have a good idea for the synthesis to be successfull?

分享到QQ

分享到微博