线性预测编码从 LPC 系数返回原始音频文件 Python

发布于 2025-01-16 12:23:00 字数 2774 浏览 3 评论 0原文

我正在尝试使用线性预测编码来压缩音频文件，方法是使用 LPC 编码文件以获得残差信号并使用 Rice 编码对该信号进行编码。我需要能够从压缩信号中检索原始音频文件。我发现这个 LPC 代码可以对音频文件进行编码和解码，但是解码后的音频文件有奇怪的、混乱的音频。如何编辑 LPC 代码来解码系数，以便恢复原始音频？目前，解码后的音频是原始音频文件的嘈杂且难以理解的版本。

以下是对音频文件进行编码以获得 LPC 系数的代码。结果是预测系数和信号功率。

def make_matrix_X(x, p):
    n = len(x)

    xz = np.concatenate([x[::-1], np.zeros(p)])
    
    X = np.zeros((n - 1, p))
    for i in range(n - 1):
        offset = n - 1 - i 
        X[i, :] = xz[offset : offset + p]
    
    return X

"""
Encodes the input signal into lpc coefficients

x - single channel input signal
p - lpc order
nw - window length
"""
def lpc_encode(x, p, w):  
    n = len(x)
    nw = len(w)
    
    # overlapping factor
    R = 0.5
    step = floor(nw * (1 - R))
    nb = floor((n - nw) / step) + 1
    
    # list of overlapping blocks
    B = np.zeros((nb, nw))
    
    for i in range(nb):
        offset = i * step
        B[i, :] = w * x[offset : nw + offset]
    
    # the coefficients
    A = np.zeros((p, nb))
    
    # the signal power
    G = np.zeros((1, nb))

    for i in range(nb):
        x = B[i, :]
        
        b = x[1:].T
        
        X = make_matrix_X(x, p)

        a = np.linalg.lstsq(X, b)[0]

        e = b.T - np.dot(X, a)
        g = np.var(e)
   
        A[:, i] = a
        G[:, i] = g
    
    return [A, G]

这是解码系数以获得音频文件的代码，但解码后音频全乱了。输入是预测系数、信号功率和极数。

"""
Decodes the LPC coefficients

* A - the LPC filter coefficients
* G - the signal power(G) or the signal power with fundamental frequency(GF) 
       or the full source signal(E) of each windowed segment.
* w - the window function
* lowcut - the cutoff frequency in normalized frequencies for a lowcut
          filter.
"""
def lpc_decode(A, G, w, lowcut = 0):
    [ne, n] = G.shape
    nw = len(w)
    [p, _] = A.shape
    
    # list of overlapping blocks
    B = np.zeros((n, nw))

    for i in range(n):
        src = np.sqrt(G[:, i])*randn(nw, 1) # noise
    
        b = np.concatenate([np.array([-1]), A[:, i]])
    
        x_hat = lfilter([1], b.T, src.T).T 
        
        B[i,:] = np.squeeze(x_hat)

    # recover signal from blocks
    [count, nw] = B.shape
    R = 0.5
    step = floor(nw * (1 - R))
    n = (count-1) * step + nw
    
    # the rendered signal
    x = np.zeros((n, ))

    for i in range(count):
        offset = i * step
        x[offset : nw + offset] += B[i, :]
        
    return x

运行函数的代码：

[sample_rate, amplitudes] = scipy.io.wavfile.read('Sound1.wav')
amplitudes = np.array(amplitudes)
w = hann(floor(0.03*sample_rate), False)
        
# Encode
[A, G] = lpc_encode(amplitudes, 6, w)
        
# Decode
xhat = lpc_decode(A, G, w)
scipy.io.wavfile.write("example.wav", sample_rate, xhat)

原文

I am attempting to use Linear Predictive Coding to compress an audio file by encoding the file with LPC to get the residual signal and encoding that signal with Rice coding. I need to be able to retrieve the original audio file back from the compressed signal. I found this LPC code to encode and decode audio files but the decoded audio file has strange, messed up audio. How do I edit the LPC code to decode the coefficients so I would get the original audio back? Currently, the decoded audio is a noisy and unintelligible version of the original audio file.

Here is the code that encodes the audio file to get the LPC coefficients. The result is the predicted coefficients and signal power.

def make_matrix_X(x, p):
    n = len(x)

    xz = np.concatenate([x[::-1], np.zeros(p)])
    
    X = np.zeros((n - 1, p))
    for i in range(n - 1):
        offset = n - 1 - i 
        X[i, :] = xz[offset : offset + p]
    
    return X

"""
Encodes the input signal into lpc coefficients

x - single channel input signal
p - lpc order
nw - window length
"""
def lpc_encode(x, p, w):  
    n = len(x)
    nw = len(w)
    
    # overlapping factor
    R = 0.5
    step = floor(nw * (1 - R))
    nb = floor((n - nw) / step) + 1
    
    # list of overlapping blocks
    B = np.zeros((nb, nw))
    
    for i in range(nb):
        offset = i * step
        B[i, :] = w * x[offset : nw + offset]
    
    # the coefficients
    A = np.zeros((p, nb))
    
    # the signal power
    G = np.zeros((1, nb))

    for i in range(nb):
        x = B[i, :]
        
        b = x[1:].T
        
        X = make_matrix_X(x, p)

        a = np.linalg.lstsq(X, b)[0]

        e = b.T - np.dot(X, a)
        g = np.var(e)
   
        A[:, i] = a
        G[:, i] = g
    
    return [A, G]

And here is the code to decode the coefficients to get the audio file but the audio is all messed up after decoding. The inputs are the predicted coefficients, signal power and the number of poles.

"""
Decodes the LPC coefficients

* A - the LPC filter coefficients
* G - the signal power(G) or the signal power with fundamental frequency(GF) 
       or the full source signal(E) of each windowed segment.
* w - the window function
* lowcut - the cutoff frequency in normalized frequencies for a lowcut
          filter.
"""
def lpc_decode(A, G, w, lowcut = 0):
    [ne, n] = G.shape
    nw = len(w)
    [p, _] = A.shape
    
    # list of overlapping blocks
    B = np.zeros((n, nw))

    for i in range(n):
        src = np.sqrt(G[:, i])*randn(nw, 1) # noise
    
        b = np.concatenate([np.array([-1]), A[:, i]])
    
        x_hat = lfilter([1], b.T, src.T).T 
        
        B[i,:] = np.squeeze(x_hat)

    # recover signal from blocks
    [count, nw] = B.shape
    R = 0.5
    step = floor(nw * (1 - R))
    n = (count-1) * step + nw
    
    # the rendered signal
    x = np.zeros((n, ))

    for i in range(count):
        offset = i * step
        x[offset : nw + offset] += B[i, :]
        
    return x

The code to run the functions:

[sample_rate, amplitudes] = scipy.io.wavfile.read('Sound1.wav')
amplitudes = np.array(amplitudes)
w = hann(floor(0.03*sample_rate), False)
        
# Encode
[A, G] = lpc_encode(amplitudes, 6, w)
        
# Decode
xhat = lpc_decode(A, G, w)
scipy.io.wavfile.write("example.wav", sample_rate, xhat)

分享到QQ

分享到微博