在Python中读取*.wav文件

发布于 2024-08-18 03:57:09 字数 335 浏览 12 评论 0原文

我需要分析 .wav 文件中写入的声音。为此,我需要将此文件转换为一组数字(例如数组)。我想我需要使用wave包。但是,我不知道它到底是如何工作的。例如,我执行了以下操作:

import wave
w = wave.open('/usr/share/sounds/ekiga/voicemail.wav', 'r')
for i in range(w.getnframes()):
    frame = w.readframes(i)
    print frame

由于此代码,我期望看到声压作为时间的函数。相比之下,我看到了很多奇怪、神秘的符号(不是十六进制数字)。有人可以帮我吗?

I need to analyze sound written in a .wav file. For that I need to transform this file into set of numbers (arrays, for example). I think I need to use the wave package. However, I do not know how exactly it works. For example I did the following:

import wave
w = wave.open('/usr/share/sounds/ekiga/voicemail.wav', 'r')
for i in range(w.getnframes()):
    frame = w.readframes(i)
    print frame

As a result of this code I expected to see sound pressure as function of time. In contrast I see a lot of strange, mysterious symbols (which are not hexadecimal numbers). Can anybody, pleas, help me with that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

淡紫姑娘! 2024-08-25 03:57:10

您还可以使用简单的import wavio库,您还需要有一些声音的基本知识。

u can also use simple import wavio library u also need have some basic knowledge of the sound.

蘸点软妹酱 2024-08-25 03:57:09

根据文档scipy.io.wavfile.read(somefile) 返回两个项目的元组:第一个是 采样率(以每秒样本数为单位),第二个是一个 numpy 数组,其中包含从文件读取的所有数据:

from scipy.io import wavfile
samplerate, data = wavfile.read('./output/audio.wav')

Per the documentation, scipy.io.wavfile.read(somefile) returns a tuple of two items: the first is the sampling rate in samples per second, the second is a numpy array with all the data read from the file:

from scipy.io import wavfile
samplerate, data = wavfile.read('./output/audio.wav')
时光与爱终年不遇 2024-08-25 03:57:09

使用 struct 模块,您可以获取波形帧(位于 -32768 和 32767 之间的 2 的互补二进制文件中(即 0x8000 和 0x7FFF),这会读取一个 MONO、16 位 WAVE 文件。 home.roadrunner.com/%7Ejgglatt/tech/wave.htm" rel="nofollow noreferrer">此网页在表述此内容时非常有用:

import wave, struct

wavefile = wave.open('sine.wav', 'r')

length = wavefile.getnframes()
for i in range(0, length):
    wavedata = wavefile.readframes(1)
    data = struct.unpack("<h", wavedata)
    print(int(data[0]))

此片段读取 1 帧。要读取超过一帧(例如,13 ), 使用

wavedata = wavefile.readframes(13)
data = struct.unpack("<13h", wavedata)

Using the struct module, you can take the wave frames (which are in 2's complementary binary between -32768 and 32767 (i.e. 0x8000 and 0x7FFF). This reads a MONO, 16-BIT, WAVE file. I found this webpage quite useful in formulating this:

import wave, struct

wavefile = wave.open('sine.wav', 'r')

length = wavefile.getnframes()
for i in range(0, length):
    wavedata = wavefile.readframes(1)
    data = struct.unpack("<h", wavedata)
    print(int(data[0]))

This snippet reads 1 frame. To read more than one frame (e.g., 13), use

wavedata = wavefile.readframes(13)
data = struct.unpack("<13h", wavedata)
乱了心跳 2024-08-25 03:57:09

不同的Python模块来读取wav:

至少有以下这些库可以读取wave音频文件:

最简单的示例:

这是一个使用 SoundFile 的简单示例:

import soundfile as sf
data, samplerate = sf.read('existing_file.wav') 

输出格式:

警告,数据并不总是采用相同的格式,这取决于库。例如:

from scikits import audiolab
from scipy.io import wavfile
from sys import argv
for filepath in argv[1:]:
    x, fs, nb_bits = audiolab.wavread(filepath)
    print('Reading with scikits.audiolab.wavread:', x)
    fs, x = wavfile.read(filepath)
    print('Reading with scipy.io.wavfile.read:', x)

Output:

Reading with scikits.audiolab.wavread: [ 0.          0.          0.         ..., -0.00097656 -0.00079346 -0.00097656]
Reading with scipy.io.wavfile.read: [  0   0   0 ..., -32 -26 -32]

SoundFile 和 Audiolab 返回 -1 和 1 之间的浮点数(正如 matab 所做的那样,这是音频信号的约定)。 Scipy和wave返回整数,您可以根据编码的位数将其转换为浮点数,例如:

from scipy.io.wavfile import read as wavread
samplerate, x = wavread(audiofilename)  # x is a numpy array of integers, representing the samples 
# scale to -1.0 -- 1.0
if x.dtype == 'int16':
    nb_bits = 16  # -> 16-bit wav files
elif x.dtype == 'int32':
    nb_bits = 32  # -> 32-bit wav files
max_nb_bit = float(2 ** (nb_bits - 1))
samples = x / (max_nb_bit + 1)  # samples is a numpy array of floats representing the samples 

Different Python modules to read wav:

There is at least these following libraries to read wave audio files:

The most simple example:

This is a simple example with SoundFile:

import soundfile as sf
data, samplerate = sf.read('existing_file.wav') 

Format of the output:

Warning, the data are not always in the same format, that depends on the library. For instance:

from scikits import audiolab
from scipy.io import wavfile
from sys import argv
for filepath in argv[1:]:
    x, fs, nb_bits = audiolab.wavread(filepath)
    print('Reading with scikits.audiolab.wavread:', x)
    fs, x = wavfile.read(filepath)
    print('Reading with scipy.io.wavfile.read:', x)

Output:

Reading with scikits.audiolab.wavread: [ 0.          0.          0.         ..., -0.00097656 -0.00079346 -0.00097656]
Reading with scipy.io.wavfile.read: [  0   0   0 ..., -32 -26 -32]

SoundFile and Audiolab return floats between -1 and 1 (as matab does, that is the convention for audio signals). Scipy and wave return integers, which you can convert to floats according to the number of bits of encoding, for example:

from scipy.io.wavfile import read as wavread
samplerate, x = wavread(audiofilename)  # x is a numpy array of integers, representing the samples 
# scale to -1.0 -- 1.0
if x.dtype == 'int16':
    nb_bits = 16  # -> 16-bit wav files
elif x.dtype == 'int32':
    nb_bits = 32  # -> 32-bit wav files
max_nb_bit = float(2 ** (nb_bits - 1))
samples = x / (max_nb_bit + 1)  # samples is a numpy array of floats representing the samples 
破晓 2024-08-25 03:57:09

恕我直言,将音频数据从声音文件获取到 NumPy 数组的最简单方法是 SoundFile

import soundfile as sf
data, fs = sf.read('/usr/share/sounds/ekiga/voicemail.wav')

这也支持开箱即用的 24 位文件。

有许多可用的声音文件库,我编写了 概述,您可以在其中看到一些优点和缺点。
它还具有一个解释 的页面如何使用 wave 模块读取 24 位 wav 文件

IMHO, the easiest way to get audio data from a sound file into a NumPy array is SoundFile:

import soundfile as sf
data, fs = sf.read('/usr/share/sounds/ekiga/voicemail.wav')

This also supports 24-bit files out of the box.

There are many sound file libraries available, I've written an overview where you can see a few pros and cons.
It also features a page explaining how to read a 24-bit wav file with the wave module.

亣腦蒛氧 2024-08-25 03:57:09

您可以使用 scikits.audiolab 模块来完成此操作。它需要 NumPy 和 SciPy 才能运行,还需要 libsndfile。

请注意,我只能让它在 Ubunutu 上运行,而不能在 OSX 上运行。

from scikits.audiolab import wavread

filename = "testfile.wav"

data, sample_frequency,encoding = wavread(filename)

现在你有了 wav 数据

You can accomplish this using the scikits.audiolab module. It requires NumPy and SciPy to function, and also libsndfile.

Note, I was only able to get it to work on Ubunutu and not on OSX.

from scikits.audiolab import wavread

filename = "testfile.wav"

data, sample_frequency,encoding = wavread(filename)

Now you have the wav data

哆兒滾 2024-08-25 03:57:09

如果您想逐块处理音频,某些给定的解决方案非常糟糕,因为它们意味着将整个音频加载到内存中,从而产生许多缓存未命中并减慢程序速度。 python-wavefile 提供了一些 pythonic 结构,使用高效且透明的方式进行 NumPy 逐块处理通过生成器进行块管理。其他 Python 细节包括文件的上下文管理器、作为属性的元数据……如果您想要整个文件接口,因为您正在开发一个快速原型并且您不关心效率,那么整个文件接口仍然存在。

一个简单的处理示例是:

import sys
from wavefile import WaveReader, WaveWriter

with WaveReader(sys.argv[1]) as r:
    with WaveWriter(
        'output.wav',
        channels=r.channels,
        samplerate=r.samplerate,
    ) as w:

        # Just to set the metadata
        w.metadata.title = r.metadata.title + " II"
        w.metadata.artist = r.metadata.artist

        # This is the prodessing loop
        for data in r.read_iter(size=512):
            data[1] *= .8     # lower volume on the second channel
            w.write(data)

该示例重复使用同一块来读取整个文件,即使最后一个块通常小于所需大小。在这种情况下,您将获得块的一部分。因此,请相信返回的块长度,而不是使用硬编码的 512 大小进行任何进一步处理。

If you want to procces an audio block by block, some of the given solutions are quite awful in the sense that they imply loading the whole audio into memory producing many cache misses and slowing down your program. python-wavefile provides some pythonic constructs to do NumPy block-by-block processing using efficient and transparent block management by means of generators. Other pythonic niceties are context manager for files, metadata as properties... and if you want the whole file interface, because you are developing a quick prototype and you don't care about efficency, the whole file interface is still there.

A simple example of processing would be:

import sys
from wavefile import WaveReader, WaveWriter

with WaveReader(sys.argv[1]) as r:
    with WaveWriter(
        'output.wav',
        channels=r.channels,
        samplerate=r.samplerate,
    ) as w:

        # Just to set the metadata
        w.metadata.title = r.metadata.title + " II"
        w.metadata.artist = r.metadata.artist

        # This is the prodessing loop
        for data in r.read_iter(size=512):
            data[1] *= .8     # lower volume on the second channel
            w.write(data)

The example reuses the same block to read the whole file, even in the case of the last block that usually is less than the required size. In this case you get an slice of the block. So trust the returned block length instead of using a hardcoded 512 size for any further processing.

深居我梦 2024-08-25 03:57:09

亲爱的,据我了解您正在寻找什么,您正在进入一个称为数字信号处理(DSP)的理论领域。该工程领域从离散时间信号的简单分析到复杂的自适应滤波器。一个好主意是将离散时间信号视为一个向量,其中该向量的每个元素都是原始连续时间信号的采样值。一旦获得矢量形式的样本,您就可以对该矢量应用不同的数字信号技术。

不幸的是,在 Python 上,从音频文件转移到 NumPy 数组向量相当麻烦,您可能会注意到......如果您不崇拜一种编程语言而不是其他语言,我强烈建议尝试 MatLab/Octave。 Matlab 使从文件访问示例变得简单。 audioread() 为您完成了这项任务:) 并且有很多专门为 DSP 设计的工具箱。

不过,如果您确实打算为此学习 Python,我将为您提供一步一步的指导。


1. 获取样本

.wav 文件获取样本的最简单方法是:

from scipy.io import wavfile

sampling_rate, samples = wavfile.read(f'/path/to/file.wav')


或者,您可以使用 wavestruct 包获取示例:

import numpy as np
import wave, struct

wav_file = wave.open(f'/path/to/file.wav', 'rb')
# from .wav file to binary data in hexadecimal
binary_data = wav_file.readframes(wav_file.getnframes())
# from binary file to samples
s = np.array(struct.unpack('{n}h'.format(n=wav_file.getnframes()*wav_file.getnchannels()), binary_data))

回答您的问题:binary_data 是一个 bytes 对象,它不是人类可读的,只能对机器有意义。您可以输入 type(binary_data) 来验证此语句。如果您确实想更多地了解这堆奇怪的字符,请单击此处< /a>.

如果您的音频是立体声(即有 2 个通道),您可以重塑此信号以实现与 scipy.io 获得的相同格式。

s_like_scipy = s.reshape(-1, wav_file.getnchannels())

每一列都是一个通道。无论哪种方式,从 .wav 文件获得的样本都可用于绘制和理解信号的时间行为。

在这两种替代方案中,从文件获得的样本都在 中表示线性脉冲编码调制 (LPCM)


2. 对音频样本进行数字信号处理

我将把这部分留给你:) 但是 这是一本带您了解 DSP 的好书。不幸的是,我不知道有关 Python 的好书,它们通常都是可怕的书......但不用担心,该理论可以使用任何编程语言以同样的方式应用,只要您专注于该语言。

无论您选择哪本书,请坚持选择经典作者,例如 Proakis、Oppenheim 等......不要关心他们使用的编程语言。有关使用 Python 进行音频 DPS 的更实用指南,请参阅此页面。

3. 播放过滤后的音频样本

import pyaudio

p = pyaudio.PyAudio()
stream = p.open(format = p.get_format_from_width(wav_file.getsampwidth()),
                channels = wav_file.getnchannels(),
                rate = wav_file.getframerate(),
                output = True)
# from samples to the new binary file
new_binary_data = struct.pack('{}h'.format(len(s)), *s)
stream.write(new_binary_data)

,其中 wav_file.getsampwidth() 是每个样本的字节数,wav_file.getframerate() 是采样率。只需使用输入音频的相同参数即可。


4. 将结果保存在新的 .wav 文件中

wav_file=wave.open('/phat/to/new_file.wav', 'w')

wav_file.setparams((nchannels, sampwidth, sampling_rate, nframes, "NONE", "not compressed"))

for sample in s:
   wav_file.writeframes(struct.pack('h', int(sample)))

,其中 nchannels 是通道数,sampwidth 是每个样本的字节数,< code>sampling_rate 是采样率,nframes 是样本总数。

My dear, as far as I understood what you are looking for, you are getting into a theory field called Digital Signal Processing (DSP). This engineering area comes from a simple analysis of discrete-time signals to complex adaptive filters. A nice idea is to think of the discrete-time signals as a vector, where each element of this vector is a sampled value of the original, continuous-time signal. Once you get the samples in a vector form, you can apply different digital signal techniques to this vector.

Unfortunately, on Python, moving from audio files to NumPy array vector is rather cumbersome, as you could notice... If you don't idolize one programming language over other, I highly suggest trying out MatLab/Octave. Matlab makes the samples access from files straightforward. audioread() makes this task to you :) And there are a lot of toolboxes designed specifically for DSP.

Nevertheless, if you really intend to get into Python for this, I'll give you a step-by-step to guide you.


1. Get the samples

The easiest way the get the samples from the .wav file is:

from scipy.io import wavfile

sampling_rate, samples = wavfile.read(f'/path/to/file.wav')


Alternatively, you could use the wave and struct package to get the samples:

import numpy as np
import wave, struct

wav_file = wave.open(f'/path/to/file.wav', 'rb')
# from .wav file to binary data in hexadecimal
binary_data = wav_file.readframes(wav_file.getnframes())
# from binary file to samples
s = np.array(struct.unpack('{n}h'.format(n=wav_file.getnframes()*wav_file.getnchannels()), binary_data))

Answering your question: binary_data is a bytes object, which is not human-readable and can only make sense to a machine. You can validate this statement typing type(binary_data). If you really want to understand a little bit more about this bunch of odd characters, click here.

If your audio is stereo (that is, has 2 channels), you can reshape this signal to achieve the same format obtained with scipy.io

s_like_scipy = s.reshape(-1, wav_file.getnchannels())

Each column is a chanell. In either way, the samples obtained from the .wav file can be used to plot and understand the temporal behavior of the signal.

In both alternatives, the samples obtained from the files are represented in the Linear Pulse Code Modulation (LPCM)


2. Do digital signal processing stuffs onto the audio samples

I'll leave that part up to you :) But this is a nice book to take you through DSP. Unfortunately, I don't know good books with Python, they are usually horrible books... But do not worry about it, the theory can be applied in the very same way using any programming language, as long as you domain that language.

Whatever the book you pick up, stick with the classical authors, such as Proakis, Oppenheim, and so on... Do not care about the language programming they use. For a more practical guide of DPS for audio using Python, see this page.

3. Play the filtered audio samples

import pyaudio

p = pyaudio.PyAudio()
stream = p.open(format = p.get_format_from_width(wav_file.getsampwidth()),
                channels = wav_file.getnchannels(),
                rate = wav_file.getframerate(),
                output = True)
# from samples to the new binary file
new_binary_data = struct.pack('{}h'.format(len(s)), *s)
stream.write(new_binary_data)

where wav_file.getsampwidth() is the number of bytes per sample, and wav_file.getframerate() is the sampling rate. Just use the same parameters of the input audio.


4. Save the result in a new .wav file

wav_file=wave.open('/phat/to/new_file.wav', 'w')

wav_file.setparams((nchannels, sampwidth, sampling_rate, nframes, "NONE", "not compressed"))

for sample in s:
   wav_file.writeframes(struct.pack('h', int(sample)))

where nchannels is the number of channels, sampwidth is the number of bytes per samples, sampling_rate is the sampling rate, nframes is the total number of samples.

等风来 2024-08-25 03:57:09

如果您要对波形数据执行传输,那么也许您应该使用 SciPy,特别是 scipy.io.wavfile

If you're going to perform transfers on the waveform data then perhaps you should use SciPy, specifically scipy.io.wavfile.

白云不回头 2024-08-25 03:57:09

这是一个使用内置 Wave 模块 [1] 的 Python 3 解决方案,适用于 n 个通道和 8、16、24...位。

import sys
import wave

def read_wav(path):
    with wave.open(path, "rb") as wav:
        nchannels, sampwidth, framerate, nframes, _, _ = wav.getparams()
        print(wav.getparams(), "\nBits per sample =", sampwidth * 8)

        signed = sampwidth > 1  # 8 bit wavs are unsigned
        byteorder = sys.byteorder  # wave module uses sys.byteorder for bytes

        values = []  # e.g. for stereo, values[i] = [left_val, right_val]
        for _ in range(nframes):
            frame = wav.readframes(1)  # read next frame
            channel_vals = []  # mono has 1 channel, stereo 2, etc.
            for channel in range(nchannels):
                as_bytes = frame[channel * sampwidth: (channel + 1) * sampwidth]
                as_int = int.from_bytes(as_bytes, byteorder, signed=signed)
                channel_vals.append(as_int)
            values.append(channel_vals)

    return values, framerate

您可以将结果转换为 NumPy 数组。

import numpy as np

data, rate = read_wav(path)
data = np.array(data)

请注意,我试图使其可读而不是快速。我发现一次读取所有数据的速度几乎快了 2 倍。例如

with wave.open(path, "rb") as wav:
    nchannels, sampwidth, framerate, nframes, _, _ = wav.getparams()
    all_bytes = wav.readframes(-1)

framewidth = sampwidth * nchannels
frames = (all_bytes[i * framewidth: (i + 1) * framewidth]
            for i in range(nframes))

for frame in frames:
    ...

,虽然 python-soundfile 大约快了 2 个数量级(很难用纯CPython)。

[1] https://docs.python.org/3/library/wave.html

Here's a Python 3 solution using the built in wave module [1], that works for n channels, and 8,16,24... bits.

import sys
import wave

def read_wav(path):
    with wave.open(path, "rb") as wav:
        nchannels, sampwidth, framerate, nframes, _, _ = wav.getparams()
        print(wav.getparams(), "\nBits per sample =", sampwidth * 8)

        signed = sampwidth > 1  # 8 bit wavs are unsigned
        byteorder = sys.byteorder  # wave module uses sys.byteorder for bytes

        values = []  # e.g. for stereo, values[i] = [left_val, right_val]
        for _ in range(nframes):
            frame = wav.readframes(1)  # read next frame
            channel_vals = []  # mono has 1 channel, stereo 2, etc.
            for channel in range(nchannels):
                as_bytes = frame[channel * sampwidth: (channel + 1) * sampwidth]
                as_int = int.from_bytes(as_bytes, byteorder, signed=signed)
                channel_vals.append(as_int)
            values.append(channel_vals)

    return values, framerate

You can turn the result into a NumPy array.

import numpy as np

data, rate = read_wav(path)
data = np.array(data)

Note, I've tried to make it readable rather than fast. I found reading all the data at once was almost 2x faster. E.g.

with wave.open(path, "rb") as wav:
    nchannels, sampwidth, framerate, nframes, _, _ = wav.getparams()
    all_bytes = wav.readframes(-1)

framewidth = sampwidth * nchannels
frames = (all_bytes[i * framewidth: (i + 1) * framewidth]
            for i in range(nframes))

for frame in frames:
    ...

Although python-soundfile is roughly 2 orders of magnitude faster (hard to approach this speed with pure CPython).

[1] https://docs.python.org/3/library/wave.html

不羁少年 2024-08-25 03:57:09

我需要读取 1 通道 24 位 WAV 文件。 Nak 的上述帖子非常有用。然而,正如上面提到的 basj 24 位并不简单。我终于使用以下代码片段使其工作:

from scipy.io import wavfile
TheFile = 'example24bit1channelFile.wav'
[fs, x] = wavfile.read(TheFile)

# convert the loaded data into a 24bit signal

nx = len(x)
ny = nx/3*4    # four 3-byte samples are contained in three int32 words

y = np.zeros((ny,), dtype=np.int32)    # initialise array

# build the data left aligned in order to keep the sign bit operational.
# result will be factor 256 too high

y[0:ny:4] = ((x[0:nx:3] & 0x000000FF) << 8) | \
  ((x[0:nx:3] & 0x0000FF00) << 8) | ((x[0:nx:3] & 0x00FF0000) << 8)
y[1:ny:4] = ((x[0:nx:3] & 0xFF000000) >> 16) | \
  ((x[1:nx:3] & 0x000000FF) << 16) | ((x[1:nx:3] & 0x0000FF00) << 16)
y[2:ny:4] = ((x[1:nx:3] & 0x00FF0000) >> 8) | \
  ((x[1:nx:3] & 0xFF000000) >> 8) | ((x[2:nx:3] & 0x000000FF) << 24)
y[3:ny:4] = (x[2:nx:3] & 0x0000FF00) | \
  (x[2:nx:3] & 0x00FF0000) | (x[2:nx:3] & 0xFF000000)

y = y/256   # correct for building 24 bit data left aligned in 32bit words

如果您需要 -1 和 +1 之间的结果,则需要一些额外的缩放。也许你们中的一些人可能会发现这很有用

I needed to read a 1-channel 24-bit WAV file. The post above by Nak was very useful. However, as mentioned above by basj 24-bit is not straightforward. I finally got it working using the following snippet:

from scipy.io import wavfile
TheFile = 'example24bit1channelFile.wav'
[fs, x] = wavfile.read(TheFile)

# convert the loaded data into a 24bit signal

nx = len(x)
ny = nx/3*4    # four 3-byte samples are contained in three int32 words

y = np.zeros((ny,), dtype=np.int32)    # initialise array

# build the data left aligned in order to keep the sign bit operational.
# result will be factor 256 too high

y[0:ny:4] = ((x[0:nx:3] & 0x000000FF) << 8) | \
  ((x[0:nx:3] & 0x0000FF00) << 8) | ((x[0:nx:3] & 0x00FF0000) << 8)
y[1:ny:4] = ((x[0:nx:3] & 0xFF000000) >> 16) | \
  ((x[1:nx:3] & 0x000000FF) << 16) | ((x[1:nx:3] & 0x0000FF00) << 16)
y[2:ny:4] = ((x[1:nx:3] & 0x00FF0000) >> 8) | \
  ((x[1:nx:3] & 0xFF000000) >> 8) | ((x[2:nx:3] & 0x000000FF) << 24)
y[3:ny:4] = (x[2:nx:3] & 0x0000FF00) | \
  (x[2:nx:3] & 0x00FF0000) | (x[2:nx:3] & 0xFF000000)

y = y/256   # correct for building 24 bit data left aligned in 32bit words

Some additional scaling is required if you need results between -1 and +1. Maybe some of you out there might find this useful

浅笑依然 2024-08-25 03:57:09

PyDub (http://pydub.com/)尚未提及,应该修复。在我看来,这是目前 Python 中读取音频文件最全面的库,尽管也有其缺点。读取 wav 文件:

from pydub import AudioSegment

audio_file = AudioSegment.from_wav('path_to.wav')
# or
audio_file = AudioSegment.from_file('path_to.wav')

# do whatever you want with the audio, change bitrate, export, convert, read info, etc.
# Check out the API docs http://pydub.com/

PS.该示例是关于读取 wav 文件的,但 PyDub 可以开箱即用地处理许多不同的格式。需要注意的是,它基于本机 Python wav 支持和 ffmpeg,因此您必须安装 ffmpeg,并且许多 pydub 功能依赖于 ffmpeg 版本。通常,如果 ffmpeg 可以做到,那么 pydub 也可以(非常强大)。

非免责声明:我与该项目无关,但我是一个重度用户。

PyDub (http://pydub.com/) has not been mentioned and that should be fixed. IMO this is the most comprehensive library for reading audio files in Python right now, although not without its faults. Reading a wav file:

from pydub import AudioSegment

audio_file = AudioSegment.from_wav('path_to.wav')
# or
audio_file = AudioSegment.from_file('path_to.wav')

# do whatever you want with the audio, change bitrate, export, convert, read info, etc.
# Check out the API docs http://pydub.com/

PS. The example is about reading a wav file, but PyDub can handle a lot of various formats out of the box. The caveat is that it's based on both native Python wav support and ffmpeg, so you have to have ffmpeg installed and a lot of the pydub capabilities rely on the ffmpeg version. Usually if ffmpeg can do it, so can pydub (which is quite powerful).

Non-disclaimer: I'm not related to the project, but I am a heavy user.

无法回应 2024-08-25 03:57:09

如果它只有两个文件并且采样率非常高,您可以将它们交错放置。

from scipy.io import wavfile
rate1,dat1 = wavfile.read(File1)
rate2,dat2 = wavfile.read(File2)

if len(dat2) > len(dat1):#swap shortest
    temp = dat2
    dat2 = dat1
    dat1 = temp

output = dat1
for i in range(len(dat2)/2): output[i*2]=dat2[i*2]

wavfile.write(OUTPUT,rate,dat)

if its just two files and the sample rate is significantly high, you could just interleave them.

from scipy.io import wavfile
rate1,dat1 = wavfile.read(File1)
rate2,dat2 = wavfile.read(File2)

if len(dat2) > len(dat1):#swap shortest
    temp = dat2
    dat2 = dat1
    dat1 = temp

output = dat1
for i in range(len(dat2)/2): output[i*2]=dat2[i*2]

wavfile.write(OUTPUT,rate,dat)
九厘米的零° 2024-08-25 03:57:09

正如其他答案所述,有很多方法可以在 python 中读取 wav 文件。使用内置 wave 模块的优点是不需要外部依赖项。首先是解决方案 - 它读取单声道或立体声 wav 文件并打印第一个通道的前 100 个样本:

import wave
import sys

w = wave.open('/path/to/your-file.wav', 'rb')
channels = w.getnchannels()
samplewidth = w.getsampwidth()
print(f"Audio has {channels} channels and each sample is {samplewidth} bytes ({samplewidth * 8} bits) wide")
samples = []

# Iterate over the frames
for n in range(w.getnframes()):
    # Read a frames bytes
    frame = w.readframes(n)
    # Skip empty frames
    if frame != b'':
        # Convert the frame into a list of integers, assuming the systems
        # endianess and signed integers
        frame_data = [int.from_bytes(frame[i:i+samplewidth], byteorder=sys.byteorder, signed=True) for i in range(0, len(frame), samplewidth)]
        # If we have more than one channel the samples of each channel
        # should be interleaved
        if channels == 1:
            # Mono is simple: each frame can contain multiple samples
            for sample in frame_data:
                samples.append(sample)
        elif channels == 2:
            # Stereo samples are interleaved: (L/R/L/R/...)
            # Iterate in steps of 2 over the frames and deinterleave
            # them into the samples for left and right
            for c in range(0, len(frame_data), 2):
                left, right = zip(frame_data[c:c+2])
                left, right = left[0], right[0]
                samples.append(right)
        else:
            # Print lame excuse and exit
            print(f"Error: Sorry, we do not support wave files with {channels} channels", file=sys.stderr)
            exit(1)

# Print first 100 samples
print(samples[100:])

详细信息

样本

最终,二进制文件中的所有内容都是字节(您得到的那些奇怪的字符)。一个字节由 8 位组成,可以是 01。现在,了解一些音频文件知识后,您可能知道 wav 文件有不同的位深度。消费者音频(例如 CD、YouTube 视频中的音频等)中的样本通常为 16 位,这为我们提供了 2^1665536 步长的垂直分辨率。但也有用于声音工作室应用程序的 24 位或越来越多的 32 位(浮动)文件。这意味着为了以正确的方式解释样本的字节,我们需要知道一个样本使用了多少字节以及它们是如何排序的。很高兴 .getsampwidth() 方法会告诉我们这一点:

例如,我读取了一个 24 位 wav 文件,我得到的样本宽度是 3 个字节 - 3×8 位结果确实是 24。所以我需要从帧中获取 3 个字节并将它们转换为整数:

sample = [int.from_bytes(frame[i:i+samplewidth], byteorder=sys.byteorder, signed=True) for i in range(0, len(frame), samplewidth)]

byteorder=sys.byteorder 描述了字节的字节顺序 - 因此我们是否必须从左到右读取它们(“big” )或从右到左(“小”)以构造我们的数字。在这种情况下,我们只采用系统的字节顺序。请注意,对于 8 位音频,这一点可以忽略,因为只有一个字节,并且没有可以读取的方向。

signed=True 表示我们期望有符号整数,而不是只有正数的无符号整数。签名应适用于最常见的 16 位和 24 位音频文件。

如果您想将音频转换为 -1.0 和 +1.0 之间的浮点数,您需要计算出一半中可能值的数量(例如 2**24 // 2),然后除以以此为样本。

通道

一个波形文件可以携带多个音频通道。它可以是单声道、立体声、环绕声或其他多声道配置。单声道是简单的情况,但在多通道波形中,样本通常是交错的。这意味着一帧将以交替方式携带来自所有通道的样本。假设立体声,这可能是:

L/R/L/R/L/R

我使用 python zip 函数将样本解压到单独的 left 和 right 变量中。

警告

我认为读取波形文件的主要挑战是处理波形文件的所有可能的外观方式。 Wave 文件可能变得比这更复杂(例如元数据标题、章节标记等),因此为了完全兼容,依赖其他东西可能是明智的。但如果您知道您想要读取的波形文件,这样的内容可能会很好地工作。

As the other answers lay out there are many ways to read a wav file in python. Using the built in wave module has the advantage that no external dependencies are needed. First the solution – this reads a mono or stereo wavfile and prints the first 100 samples of the first channel:

import wave
import sys

w = wave.open('/path/to/your-file.wav', 'rb')
channels = w.getnchannels()
samplewidth = w.getsampwidth()
print(f"Audio has {channels} channels and each sample is {samplewidth} bytes ({samplewidth * 8} bits) wide")
samples = []

# Iterate over the frames
for n in range(w.getnframes()):
    # Read a frames bytes
    frame = w.readframes(n)
    # Skip empty frames
    if frame != b'':
        # Convert the frame into a list of integers, assuming the systems
        # endianess and signed integers
        frame_data = [int.from_bytes(frame[i:i+samplewidth], byteorder=sys.byteorder, signed=True) for i in range(0, len(frame), samplewidth)]
        # If we have more than one channel the samples of each channel
        # should be interleaved
        if channels == 1:
            # Mono is simple: each frame can contain multiple samples
            for sample in frame_data:
                samples.append(sample)
        elif channels == 2:
            # Stereo samples are interleaved: (L/R/L/R/...)
            # Iterate in steps of 2 over the frames and deinterleave
            # them into the samples for left and right
            for c in range(0, len(frame_data), 2):
                left, right = zip(frame_data[c:c+2])
                left, right = left[0], right[0]
                samples.append(right)
        else:
            # Print lame excuse and exit
            print(f"Error: Sorry, we do not support wave files with {channels} channels", file=sys.stderr)
            exit(1)

# Print first 100 samples
print(samples[100:])

Details

Samples

Ultimately everything in a binary file is bytes (those weird characters you got). A byte consists of 8 bits that can be either 0 or 1. Now with a bit of knowledge of audio files you might know that wav files come in different bitdepths. A sample in consumer audio (say a CD, the Audio from a youtube video etc) is typically 16 bits which gives us a vertical resolution of 2^16 or 65536 steps. But there is also 24 bits for sound studio applications or more and more 32 bit (float) files. That means in order to interpret the bytes of our sample in the right way, we need to know how many bytes are used for one sample and how they are ordered. Gladly the .getsampwidth()-method will tell us this:

I for example read a 24-bit wav file and the samplewidth I got was 3 bytes – 3×8 bit results indeed in 24. So I need to get 3 bytes from the frame and convert them to a integer number:

sample = [int.from_bytes(frame[i:i+samplewidth], byteorder=sys.byteorder, signed=True) for i in range(0, len(frame), samplewidth)]

byteorder=sys.byteorder describes the endianess of the bytes – so whether we have to read them from left to right ("big") or from right to left ("little") in order to construct our number. In this case we just take whatever the endianess of our system is. Note that for 8 bit audio this can be ignored, as there is only one byte and there is no direction in which it can be read.

signed=True says that we expect signed integers, as opposed to unsigned ones which are only positive. Signed should work for most common 16 and 24 bit audio files.

If you want to convert the audio to e.g. a float between -1.0 and +1.0 you need to work out the number of possible values in one half (e.g. 2**24 // 2) and divide your sample by that.

Channels

A wave file can carry more than one audio channel. It could be mono, stereo, surround or other multichannel-configurations. Mono would be the simple case, but in multichannel wavs the samples are typically interleaved. That means one frame will carry samples from all channels in alternating fashion. Assuming Stereo, that might be:

L/R/L/R/L/R

I use pythons zip function to unpack the samples into seperate left and right variables.

Caveat

I think the major challenge in reading wave files is handling all the possible ways a wave file can look. Wave files can get even more complicated than that (e.g. meta-data headers, chapter marks, ...) so for full compatibility it might be wise to rely on something else. But if you know the wave files you want to read something like this might work fine.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文