在 DSP 开发板上使用 C 语言流水线化一维卷积算法

发布于 2024-12-15 23:45:37 字数 2832 浏览 3 评论 0原文

我当前使用的 DSP 板是 Spectrum Digital 的 DSK6416,我正在用 C 语言实现卷积算法,以将输入语音样本与预先记录的脉冲响应阵列进行卷积。目标是对着麦克风说话,并输出处理后的效果,使我们听起来像是在获得脉冲响应阵列的环境中说话。

我现在面临的挑战是实时进行卷积并保持中断函数输入和输出速度在 8 kHz 的速度。

这是我的头脑风暴想法:

我目前无效的低效实现如下:

中断将停止卷积过程,输出索引,并以 8 kHz 或 1/8kHz 秒恢复卷积。

然而,完整的卷积迭代运行速度比 1/8kHz 秒慢得多。所以当中断想要从输出数组输出数据时,数据还没有准备好。

我对快速流水线卷积算法的理想实现:

我们将在后台运行许多卷积进程,同时随着时间的推移输出完成的卷积进程。将会有许多管道并行运行。

如果我使用管道方法,我们需要在后台运行 N = 10000 个管道进程...

现在我有了这个想法(至少我认为我这样做,我可能是错的),我不知道如何使用 C 编程语言在 DSK 板上实现这一点,因为 C 不支持面向对象。

以下是我们的 C 实现的伪代码:

#include <stdio.h>
#include "DSK6416_AIC23.h"
Uint32 fs=DSK6416_AIC23_FREQ_48KHZ;        //set sampling rate
#define DSK6416_AIC23_INPUT_MIC 0x0015
#define DSK6416_AIC23_INPUT_LINE 0x0011
Uint16 inputsource=DSK6416_AIC23_INPUT_MIC; // select input

//input & output parameters declaration
#define MAX_SIZE 10000
Uint32 curr_input;
Int16 curr_input2;
short input[1];
short impulse[MAX_SIZE ];
short output[MAX_SIZE ];
Int16 curr_output;

//counters declaration
Uint32 a, b, c, d;      //dip switch counters
int i, j, k;            //convolution iterations
int x;                  //counter for initializing output;                                     

interrupt void c_int11()         //interrupt running at 8 kHz
{
    //Reads Input
    //Start new pipe
    //Outputs output to speaker
}

void main()
{

//Read Impulse.txt into impulse array

    comm_intr();
    while(1)
    {

    if (DIP switch pressed)
    {
            //convolution here (our current inefficient convolution algorithm)
            //Need to run multiple of the same process in the background in parallel.

    for (int k = 0; k < MAX_SIZE; k++)
    {
        if (k==MAX_SIZE-1 && i == 0)  // special condition overwriting element at i = MAX_SIZE -1
        {
            output[k] = (impulse[k]*input[0]); 
        }
        else if (k+i < MAX_SIZE) // convolution from i to MAX_SIZE
        {
            output[k+i] += (impulse[k]*input[0]); 
        }

        else if (k+i-MAX_SIZE != i-1)  // convolution from 0 to i-2
        {
            output[k+i-MAX_SIZE] += (impulse[k]*input[0]); 
        }
        else   // overwrite element at i-1
        {
            output[i-1] = (impulse[k]*input[0]); 
        }
    }

    }

    else //if DIP switch is not pressed
    {
            DSK6416_LED_off(0);
            DSK6416_LED_off(1);
            DSK6416_LED_off(2);
            DSK6416_LED_off(3);
            j = 0; 
            curr_output = input[1];
            output_sample(curr_output);  //outputs unprocessed dry voice
    }
    } //end of while
    fclose(fp);
}

有没有办法在 C 代码中实现管道,以便在硬件 DSP 板上进行编译,以便我们可以同时在后台运行多个卷积迭代?

我画了一些图片,但我是这个板的新手,所以我不能发布图片。

如果您需要我的图形想法来帮助您,请告诉我〜

非常感谢任何有关如何实现此代码的帮助!

The DSP board I am currently using is DSK6416 from Spectrum Digital, and I am implementing a convolution algorithm in C to convolve input voice samples with a pre-recorded impulse response array. The objective is to speak into the microphone, and output the processed effect so we sound like we are speaking in that environment where the impulse response array is obtained.

The challenge I am facing now is doing the convolution live and keep up the pace of the input and output speed of the interrupt function at 8 kHz.

Here is my brain storming idea:

My current inefficient implementation that does not work is as follows:

The interrupt will stop the convolution process, output the index, and resume convolution at 8 kHz, or 1/8kHz seconds.

However, a complete iteration of convolution runs much slower than 1/8kHz seconds. So when the interrupt wants to output the data from the output array, the data is not ready yet.

My ideal implementation for fast pipelining convolution algorithm:

We would have many convolution processes running in the background while outputting the completed ones as time goes on. There will be many pipes running in parallel.

If I use the pipelining approach, we would need to have N = 10000 pipeline processes running in the background...

Now I have the idea down (at least I think I do, I might be wrong), I have no clue how to implement this on the DSK board using C programming language because C does not support object orientation.

The following is the pseudo-code for our C implementation:

#include <stdio.h>
#include "DSK6416_AIC23.h"
Uint32 fs=DSK6416_AIC23_FREQ_48KHZ;        //set sampling rate
#define DSK6416_AIC23_INPUT_MIC 0x0015
#define DSK6416_AIC23_INPUT_LINE 0x0011
Uint16 inputsource=DSK6416_AIC23_INPUT_MIC; // select input

//input & output parameters declaration
#define MAX_SIZE 10000
Uint32 curr_input;
Int16 curr_input2;
short input[1];
short impulse[MAX_SIZE ];
short output[MAX_SIZE ];
Int16 curr_output;

//counters declaration
Uint32 a, b, c, d;      //dip switch counters
int i, j, k;            //convolution iterations
int x;                  //counter for initializing output;                                     

interrupt void c_int11()         //interrupt running at 8 kHz
{
    //Reads Input
    //Start new pipe
    //Outputs output to speaker
}

void main()
{

//Read Impulse.txt into impulse array

    comm_intr();
    while(1)
    {

    if (DIP switch pressed)
    {
            //convolution here (our current inefficient convolution algorithm)
            //Need to run multiple of the same process in the background in parallel.

    for (int k = 0; k < MAX_SIZE; k++)
    {
        if (k==MAX_SIZE-1 && i == 0)  // special condition overwriting element at i = MAX_SIZE -1
        {
            output[k] = (impulse[k]*input[0]); 
        }
        else if (k+i < MAX_SIZE) // convolution from i to MAX_SIZE
        {
            output[k+i] += (impulse[k]*input[0]); 
        }

        else if (k+i-MAX_SIZE != i-1)  // convolution from 0 to i-2
        {
            output[k+i-MAX_SIZE] += (impulse[k]*input[0]); 
        }
        else   // overwrite element at i-1
        {
            output[i-1] = (impulse[k]*input[0]); 
        }
    }

    }

    else //if DIP switch is not pressed
    {
            DSK6416_LED_off(0);
            DSK6416_LED_off(1);
            DSK6416_LED_off(2);
            DSK6416_LED_off(3);
            j = 0; 
            curr_output = input[1];
            output_sample(curr_output);  //outputs unprocessed dry voice
    }
    } //end of while
    fclose(fp);
}

Is there a way to implement pipeline in C code to compile on the hardware DSP board so we can run multiple convolution iterations in the background all at the same time?

I drew some pictures, but I am new to this board so I can't post images.

Please let me know if you need my pictorial ideas to help you help me~

Any help on how to implement this code is very much appreciated !!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

凉风有信 2024-12-22 23:45:37

您可能需要以 N 个样本为大块来处理数据。当一个块在 DAC/ADC 中断处理程序中进行 I/O 时,另一块正在 main() 中的某处进行处理。这里的主要任务是确保处理 N 个样本块所花费的时间少于接收/传输 N 个样本的时间。

下面是它在时间上的样子(每个步骤中的所有事情(步骤 1 除外)“并行”发生):

  1. buf1=buf3=zeroes, buf2=anything
  2. ISR:DAC 发送 buf1,ADC 接收 buf2; main():处理buf3
  3. ISR:DAC发送buf3,ADC接收buf1; main():处理buf2
  4. ISR:DAC发送buf2,ADC接收buf3; main():处理 buf1

从步骤 2 开始无限重复。

此外,您可能希望在汇编中实现卷积以获得额外的速度。我会查看一些 TI 应用笔记或其他不用于实现的内容。也许某些图书馆也可以找到它。

您还可以考虑通过快速傅里叶变换进行卷积。

You probably need to process data in chunks of some N samples. While one chunk is being I/O'd in an DAC/ADC interrupt handler, another one is being processed somewhere in main(). The main thing here is to make sure your processing of a chunk of N samples takes less time than receiving/transmitting N samples.

Here's what it may look like in time (all things in every step (except step 1) happen "in parallel"):

  1. buf1=buf3=zeroes, buf2=anything
  2. ISR: DAC sends buf1, ADC receives buf2; main(): processes buf3
  3. ISR: DAC sends buf3, ADC receives buf1; main(): processes buf2
  4. ISR: DAC sends buf2, ADC receives buf3; main(): processes buf1

Repeat indefinitely from step 2.

Also, you may want to implement your convolution in assembly for extra speed. I'd look at some TI app notes or what not for an implementation. Perhaps it's available in some library too.

You may also consider doing convolution via Fast Fourier Transform.

旧竹 2024-12-22 23:45:37

您的 DSP 每秒只有这么多可用的 CPU 周期。您需要分析算法以确定处理每个样本平均需要多少个 CPU 周期。这需要小于样本之间的 CPU 周期数。如果您没有一种算法能够在每个样本的平均周期数足够少的情况下完成,那么再多的流水线或面向对象也无济于事。

Your DSP only has so many CPU cycles available per second. You need to analyze your algorithm to determine how many CPU cycles it takes to process each sample on average. That needs to be less that the number of CPU cycles between samples. No amount of pipelining or object orientation will help if you don't have an algorithm that completes in a small enough number of cycles per sample on average.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文