如何使用karplus强算法实现插值延迟线和全通滤波器?

发布于 2024-11-19 22:23:24 字数 1088 浏览 6 评论 0原文

好的,我已经用 C 实现了 karplus 强算法。这是一个模拟拨弦声音的简单算法。您从长度为 n 的环形缓冲区开始(n = 采样频率/您想要的频率),将其通过简单的两点平均滤波器 y[n] = (x[n] + x[n-1])/2,输出它,然后将其反馈到延迟线。冲洗并重复。随着时间的推移,这可以消除噪音,从而产生自然的拨弦声音。

但我注意到,对于整数延迟线长度,几个高音可以与相同的延迟长度相匹配。另外,整数延迟长度不允许平滑变化的音高(如颤音或滑音)我读过几篇关于 karplus 算法扩展的论文,它们都讨论使用插值延迟线进行分数延迟或全通滤波器

http://quod.lib.umich .edu/cgi/p/pod/dod-idx?c=icmc;idno=bbp2372.1997.068
http://www.jaffe.com/Jaffe-Smith-Extensions-CMJ -1983.pdf
http://www.music.mcgill.ca/ ~gary/courses/projects/618_2009/NickDonaldson/index.html

我之前已经实现了插值延迟线,但仅限于波表,其中波形缓冲区不会改变。我只是以不同的速率逐步克服延迟。但令我困惑的是,当谈到 KS 算法时,论文似乎谈论的是实际改变延迟长度,而不仅仅是我逐步执行它的速率。 ks 算法使事情变得复杂,因为我应该不断地将值反馈到延迟线中。

那么我该如何实施呢?我是否将插值反馈回来还是什么?我是否可以完全摆脱两点平均低通滤波器?

全通滤波器如何工作?我是否应该用全通滤波器替换 2 点平均滤波器?我如何使用线性插值法或全通滤波器法在远处的音高之间滑行?

Ok, I've implemented the karplus strong algorithm in C. It's a simple algorithm to simulate a plucked string sound. You start with a ring buffer of length n (n = sampling freq/freq you want), pass it through a simple two point average filter y[n] = (x[n] + x[n-1])/2, output it, and then feed it back into the delay line. Rinse and repeat. This smooths out the noise over time to create a natural plucked string sound.

But I noticed that with an integer delay line length, several high pitches could be matched to the same delay length. Also, the integer delay length doesn't allow for smoothly varying pitches (like in vibrato or glissando) I've read several papers on the extensions to the karplus algorithm, and they all talk about using either an interpolated delay line for fractional delay or an all pass filter

http://quod.lib.umich.edu/cgi/p/pod/dod-idx?c=icmc;idno=bbp2372.1997.068
http://www.jaffe.com/Jaffe-Smith-Extensions-CMJ-1983.pdf
http://www.music.mcgill.ca/~gary/courses/projects/618_2009/NickDonaldson/index.html

I've implemented interpolated delay lines before, but only on wave tables where the waveform buffer doesn't change. I just step through the delay at different rates. But what confuses me is that when it comes to the KS algorithm, the papers seem to be talking about actually changing the delay length instead of just the rate I'm stepping through it. The ks algorithm complicates things because I'm supposed to be constantly feeding values back into the delay line.

So how would I go about implementing this? Do I feed the interpolated value back in or what? Do I get rid of the two point averaging low pass filter completely?

And how would the all pass filter work? Am I supposed to replace the 2 point averaging filter with the all pass filter? How would I glide between distant pitches with glissando using the linear interpolation method or allpass filter method?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蹲墙角沉默 2024-11-26 22:23:24

数字信号处理算法通常用框图表示是有充分理由的——这是思考它们的绝佳方式。对它们进行编码时,请将每个块视为具有固定输入和输出的单独单元。我认为你的一些问题来自于试图过早地结合系统的各个元素。

这是 Karplus Strong 的框图。

Wikipedia Karplus Strong 框图

对于延迟块,您需要实现分数延迟线。这将包括其自己的低通滤波器,但这是延迟线如何实现的细节。 Karplus Strong 效果还需要低通滤波器。这些过滤器的特性会有所不同。不要尝试结合。顺便说一句,您选择的平均低通滤波器的频率响应很差,会引入“梳状滤波器”效应。您可能想要设计更复杂的 FIR 或 IIR 滤波器。

那么我该如何实施呢?我是否将插值反馈回来还是什么?我是否可以完全摆脱两点平均低通滤波器?

您确实可以将插值求和样本反馈回延迟线,就像框图所示。在某些情况下,这可能会开始增加系统的净增益,并且您可能需要“标准化”延迟的输出,以便它不会失控(如果这是您所担心的)。

有许多有效的策略来实现分数延迟线,包括您提到的插值和全通滤波。这个想法是,您需要将读取和写入索引维护到延迟线中。延迟线的长度不是内存缓冲区的总长度,而是索引之间的差对延迟线总长度的模。将延迟线设置为所需的大小,并且不必担心调整其大小。

我发现将读取和写入视为永远不会回绕或过期的自由运行计数器是最方便的,因为其中

current_delay_length = (write - read) % total_delay_length
current_read_sample = delay_line[read % total_delay_length]

% 是模数。如果写入和读取计数器是浮点值或设置为定点,则它们还可以包含小数长度。无论如何,这使得修改延迟线的长度变得容易。确保执行最小延迟(写入>读取)非常重要。

不管你相信与否,您将通过改变步进速率来改变延迟线长度,就像固定长度的缓冲区一样。一般来说,您会稍微调整读取索引。它永远不应该落后于写入指针超过缓冲区长度或超出它,否则你会遇到故障。但是您可以自由地将读指针移动到写指针之后的任何位置。改变调制方式会得到不同的效果。

我强调诸如滑音之类的效果来自延迟线的读写索引的操作方式,而不是它的实现方式。您将从全通滤波器或线性插值延迟线获得类似的声音。例如,更好的分数延迟线将减少混叠噪声并支持读指针更快速的变化。

Digital signal processing algorithms are often represented as a block diagrams for good reason -- it is an excellent way to think about them. When coding them, think of each block of as a separate unit with fixed inputs and outputs. I think some of your questions come from trying to prematurely combine the various elements of the system.

Here is a block diagram for Karplus Strong.

Wikipedia Karplus Strong block diagram

For the delay block, you need to implement a fractional delay line. This will include its own lowpass filter, but that is a detail of how the delay line is implemented. The Karplus Strong effect also requires a lowpass filter. The characteristics of these filters will be different. Don't try to combine. By the way, the averaging lowpass filter you have select has a poor frequency response that introduces a "comb filter"-effect. You might want to design a more sophisticated FIR or IIR filter.

So how would I go about implementing this? Do I feed the interpolated value back in or what? Do I get rid of the two point averaging low pass filter completely?

You do feed the interpolated, summed sample back in to the delay line, just like the block diagram shows. In some cases this can start to increase the net gain of the system, and you might need to "normalize" the output of the delay so that it does not get out of control, if that's what you're worried about.

There are many valid strategies for implementing a fractional delay line, including interpolation and allpass filtering as you mention. The idea is that you will want to maintain read and write indexes into the delay line. The length of delay line is not the total length of the memory buffer, but the difference between the indexes modulo the total length of the delay line. Make the delay line as big as it needs to be and don't worry about resizing it.

I find it most convenient to treat read and write as a free running counters that never wrap around or expire, because then

current_delay_length = (write - read) % total_delay_length
current_read_sample = delay_line[read % total_delay_length]

where % is modulus. The write and read counters could also contain the fractional length if they are floating point values or set up as fixed point. In any case, this makes it easy to modify the length of the delay line. It is important to ensure that a minimum delay is enforced (write > read).

Believe it or not, you will change the delay line length by changing the rate you step through it, just like a fixed length buffer. Generally you will modulate the read index a little bit. It should never fall behind the write pointer more than a buffer length or get ahead of it, or you will get glitches. But you are free to move the read pointer anywhere in the wake of the write pointer. Changing the modulation will get different effects.

I stress that effects such as glissando come from how the delay line's read and write indexes are manipulated, not how it is implemented. You will get similar sounds from an allpass filter or a linearly interpolated delay line. Better fractional delay lines will reduce aliasing noise and support more rapid changes of read pointer, for example.

云淡月浅 2024-11-26 22:23:24

我实现了三种变体,都有各自的优点和缺点,但没有一个像我希望的那样完美。也许有人有更好的算法并想在这里分享?

一般来说,我会像 jbarlow 描述的那样做。我使用长度为 2^x 的环形缓冲区,其中 x “足够大”,例如 12,这意味着最大延迟长度为 2^12=4096 个样本,如果以 48kHz 渲染,则约为 12Hz 作为最低基频。
2 的幂的原因是模数可以通过按位 AND 来完成,这比实际模数便宜得多。

// init
int writepointer = 0;

// loop:
writepointer = (writepointer+1) & 0xFFF;

写指针保持简单,例如从 0 开始,并且对于每个输出样本始终递增 1。

读指针以相对于写指针的增量开始,每次频率发生变化时都会重新计算。

// init
float delta = samplingrate/frequency;
int readpointer = (writepointer-(int)delta)-1) & 0xFFF;
float frac = delta-(int)delta;
weight_a = frac;
weight_b = (1.0-frac);

// loop:
readpointer = (readpointer + 1) & 0xFFF;

它也会增加 1,但通常或多或少位于两个整数位置之间。我们使用向下舍入的位置来存储整数读取指针。该样本与下一个样本之间的权重是weight_a 和_b。

变体#1
忽略小数部分并按原样读取(整数)指针。

优点:无副作用,完美的延迟(由于延迟而没有隐式低通,意味着对频率响应的完全控制,没有伪影)

缺点:基频大多略有偏离,量化到整数位置。这对于高音音符来说听起来非常失谐,并且无法进行微妙的音高变化。

变体#2
读取指针样本和下一个样本之间的线性插值。
意味着我实际上从环形缓冲区读取了两个连续的样本并将它们相加,分别按weight_a和weight_b加权。

优点:完美的基频,无伪影

缺点:线性插值引入了可能不需要的低通滤波器。更糟糕的是,低通随音高的不同而变化。如果小数部分接近 0 或 1,则仅进行很少的低通滤波,而小数部分在 0.5 左右则进行大量低通滤波。这使得乐器的某些音符比其他音符更亮,并且它永远不会比该低通允许的更亮。 (不适合钢吉他或羽管键琴)

变体#3
有点抖动。我总是从整数位置读取延迟,但跟踪我所做的错误,这意味着有一个变量将小数部分相加。一旦超过 1,我就从误差中减去 1.0,并从第二个位置读取延迟。

优点:完美的基频,无隐式低通

缺点:引入声音伪影,使其听起来低保真。 (就像使用最近邻居进行下采样一样)。

结论:没有一个变化是令人满意的。要么你无法获得正确的音高、中性频率响应,要么你会引入伪影。

我在文献中读到,全通滤波器应该做得更好,但是延迟线不是已经是全通了吗?实施上会有什么不同?

I implemented three variations, all have their pros and cons, but none is perfect as I wish it would. Maybe someone has better algorithms and wants to share it here?

In general, I do it like jbarlow describes. I use a ring buffer length of 2^x, where x is "large enough", e.g. 12, that would mean a maximum delay length of 2^12=4096 samples, this is ~12Hz as the lowest base frequency if rendering @ 48kHz.
The reason for the power of two is that the modulo can be done by bitwise AND which is way cheaper than an actual modulo.

// init
int writepointer = 0;

// loop:
writepointer = (writepointer+1) & 0xFFF;

The writepointer is kept simple and starts e.g at 0 and increments always by 1 for each output sample.

The read pointer starts with a delta relative to the write pointer, calculated freshly everytime the frequency should change.

// init
float delta = samplingrate/frequency;
int readpointer = (writepointer-(int)delta)-1) & 0xFFF;
float frac = delta-(int)delta;
weight_a = frac;
weight_b = (1.0-frac);

// loop:
readpointer = (readpointer + 1) & 0xFFF;

It also increments by 1, but lies usually more or less between two integer positions. We use the down-rounded position to store in the integer readpointer. The weight between this and the next samples is weight_a and _b.

Variation #1:
Ignore the fractional part and tread the (integer) read pointer as-is.

Pros: side-effect-less, perfect delay (no implicit low pass due to the delay, means full control over the frequency response, no artefacts)

Cons: the base frequency is mostly slightly off, quantized to integer positions. This sounds very detuned for high pitch notes and cannot make subtile pitch changes.

Variation #2:
Linear interpolate between the readpointer sample and the next sample.
Means I read actually two consecutive samples from the ring buffer and sum them up, weighted by weight_a and weight_b respectively.

Pros: perfect base freqeuncy, no artefacts

Cons: The linear interpolation introduces a low-pass filter that may not be desired. Even worse, the low-pass varries depending on the pitch. If the fractional part turns out to be close to 0 or 1, there is only few low-pass filtering going on, while the fractional part being around 0.5 does heavy low pass filtering. That makes some notes of the instrument being brighter than others, and it can never be brighter than this low pass allows. (bad for steel guitar or harpsichord)

Variation #3:
Kind of jittering. I read the delay always from an integer position, but keep track of the error I do, means there is a variable that summs the fractional part up. Once it exceeds 1, I substract 1.0 from the error, and read the delay from the second position.

Pros: perfect base frequency, no implicit low pass

Cons: introduces audible artefacts that make it sound low-fi. (like downsampling with nearest neighbour).

Conclusion: None of the variations is satisfying. Either you cannot have the correct pitch, a neutral frequency response or you introduce artefacts.

I read in literature that an all-pass filter should do it better, but isn't the delay line an allpass already? What would be the difference in implementation?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文