解释 WAV 数据
我正在尝试编写一个程序来显示 PCM 数据。我一直非常沮丧地试图找到一个具有正确抽象级别的库,但我找到了 python wave 库并一直在使用它。但是,我不确定如何解释这些数据。
wave.getparams 函数返回(2 个通道、2 个字节、44100 Hz、96333 帧、无压缩、无压缩)。这一切看起来都很愉快,但后来我尝试打印一个帧:'\xc0\xff\xd0\xff',它是 4 个字节。我想一帧可能是 2 个样本,但歧义还不止于此。
96333 帧 * 2 个样本/帧 * (1/44.1k 秒/样本) = 4.3688 秒
但是,iTunes 报告的时间接近 2 秒,并且基于文件大小和比特率的计算大约为 2.7 秒。这是怎么回事?
另外,我如何知道字节是有符号的还是无符号的?
非常感谢!
I'm trying to write a program to display PCM data. I've been very frustrated trying to find a library with the right level of abstraction, but I've found the python wave library and have been using that. However, I'm not sure how to interpret the data.
The wave.getparams function returns (2 channels, 2 bytes, 44100 Hz, 96333 frames, No compression, No compression). This all seems cheery, but then I tried printing a single frame:'\xc0\xff\xd0\xff' which is 4 bytes. I suppose it's possible that a frame is 2 samples, but the ambiguities do not end there.
96333 frames * 2 samples/frame * (1/44.1k sec/sample) = 4.3688 seconds
However, iTunes reports the time as closer to 2 seconds and calculations based on file size and bitrate are in the ballpark of 2.7 seconds. What's going on here?
Additionally, how am I to know if the bytes are signed or unsigned?
Many thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
“双声道”意味着立体声,因此对每个声道的持续时间进行求和是没有意义的——因此您的偏差是两倍(2.18 秒,而不是 4.37)。至于签名,如此处所述,我引用:
这是 WAV 格式规范的一部分(实际上是其超集 RIFF),因此不依赖于您用来处理 WAV 文件的库。
"Two channels" means stereo, so it makes no sense to sum each channel's duration -- so you're off by a factor of two (2.18 seconds, not 4.37). As for signedness, as explained for example here, and I quote:
This is part of the specs of the WAV format (actually of its superset RIFF) and thus not dependent on what library you're using to deal with a WAV file.
我知道答案已经被接受,但我不久前用音频做了一些事情,你必须解压波形来做这样的事情。
另外,我使用的一个包称为 PyAudio,尽管我仍然必须使用 wave 包。
I know that an answer has already been accepted, but I did some things with audio a while ago and you have to unpack the wave doing something like this.
Also, one package that I used was called PyAudio, though I still had to use the wave package with it.
每个样本为 16 位,有 2 个通道,因此帧占用 4 个字节
Each sample is 16 bits and there 2 channels, so the frame takes 4 bytes
持续时间就是帧数除以每秒的帧数。根据您的数据,这是:
96333 / 44100 = 2.18 秒
。The duration is simply the number of frames divided by the number of frames per second. From your data this is:
96333 / 44100 = 2.18 seconds
.基于这个答案,您可以通过使用numpy.fromstring 或 numpy.fromfile。另请参阅此答案。
这就是我所做的:
如果需要将数据复制到内存中,则为形状分配新值将引发错误。这是一件好事,因为您希望就地使用数据(总体上使用更少的时间和内存)。如果可能的话,ndarray.T 函数也不会复制(即返回视图),但我不确定如何确保它不会复制。
使用 np.fromfile 直接从文件读取会更好,但您必须使用自定义 dtype 跳过标头。我还没试过这个。
Building upon this answer, you can get a good performance boost by using numpy.fromstring or numpy.fromfile. Also see this answer.
Here is what I did:
Assigning a new value to shape will throw an error if it requires data to be copied in memory. This is a good thing, since you want to use the data in place (using less time and memory overall). The ndarray.T function also does not copy (i.e. returns a view) if possible, but I'm not sure how you ensure that it does not copy.
Reading directly from the file with np.fromfile will be even better, but you would have to skip the header using a custom dtype. I haven't tried this yet.
感谢您的帮助!我让它工作了,我将在这里发布解决方案供每个人使用,以防其他可怜的灵魂需要它:
Thank you for your help! I got it working and I'll post the solution here for everyone to use in case some other poor soul needs it: