waveOutWrite API 方法的回调延迟(或延迟)时间是多少?
我正在另一个论坛上与一些开发人员就如何准确生成 MIDI 事件(消息注释等)进行辩论。人耳对轻微的计时误差非常敏感,我认为他们的主要问题来自于使用分辨率相对较低的计时器,这些计时器以 15 毫秒的间隔(大到足以导致可察觉的误差)对事件进行量化。
大约 10 年前,我编写了一个示例应用程序(Windows 95 上的 Visual Basic 5),它是一个组合的软件合成器和 MIDI 播放器。基本前提是一个蛙跳缓冲区播放系统,每个缓冲区是一个十六分音符的持续时间(例如:每分钟 120 个四分音符,每个四分音符为 500 毫秒,因此每个十六分音符为 125 毫秒,因此每个缓冲区为 5513 个样本)。每个缓冲区都是通过 waveOutWrite 方法播放的,该方法的回调函数用于对下一个缓冲区进行排队并发送 MIDI 消息。这使得基于 WAV 的音频和 MIDI 音频保持同步。
在我看来,这种方法非常有效——MIDI 音符听起来甚至没有一点点不同步(而如果你使用精确到 15 毫秒的普通计时器来播放 MIDI 音符,它们听起来会明显不同步)。
理论上,此方法将产生精确到样本或 0.0227 毫秒的 MIDI 计时(因为每毫秒有 44.1 个样本)。我怀疑这是否是这种方法的真正延迟,因为在缓冲区完成和通知 waveOutWrite 回调之间可能存在一些轻微的延迟。有谁知道这个延迟实际上会有多大?
I'm having a debate with some developers on another forum about accurately generating MIDI events (Note On messages and so forth). The human ear is pretty sensitive to slight timing inaccuracies, and I think their main problem comes from their use of relatively low-resolution timers which quantize their events around 15 millisecond intervals (which is large enough to cause perceptible inaccuracies).
About 10 years ago, I wrote a sample application (Visual Basic 5 on Windows 95) that was a combined software synthesizer and MIDI player. The basic premise was a leapfrog-buffer playback system with each buffer being the duration of a sixteenth note (example: with 120 quarter-notes per minute, each quarter-note was 500 ms and thus each sixteenth-note was 125 ms, so each buffer is 5513 samples). Each buffer was played via the waveOutWrite method, and the callback function from this method was used to queue up the next buffer and also to send MIDI messages. This kept the WAV-based audio and the MIDI audio synchronized.
To my ear, this method worked perfectly - the MIDI notes did not sound even slightly out of step (whereas if you use an ordinary timer accurate to 15 ms to play MIDI notes, they will sound noticeably out of step).
In theory, this method would produce MIDI timing accurate to the sample, or 0.0227 milliseconds (since there are 44.1 samples per millisecond). I doubt that this is the true latency of this approach, since there is presumably some slight delay between when a buffer finishes and when the waveOutWrite callback is notified. Does anyone know how big this delay would actually be?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
默认情况下,Windows 调度程序以 10 毫秒或 16 毫秒的间隔运行,具体取决于处理器。如果您使用 timeBeginPeriod() API,您可以更改此间隔(需要相当大的功耗成本)。
在 Windows XP 和 Windows 7 中,wave API 的运行延迟约为 30 毫秒,对于 Windows Vista,wave API 的延迟约为 50 毫秒。然后,您需要添加音频引擎延迟。
不幸的是,我没有一个方向的引擎延迟数据,但我们确实有一些关于引擎延迟的数据 - 我们进行了一项测试,播放通过 USB 音频设备循环返回的音调,并测量往返延迟(渲染到捕获)。在 Vista 上,往返延迟约为 80 毫秒,变化约为 10 毫秒。在 Win7 上,往返延迟约为 40 毫秒,变化约为 5 毫秒。然而,YMMV 因为每个硬件的音频硬件引入的延迟量是不同的。
我完全不知道 XP 音频引擎或 Win9x 音频堆栈的延迟是多少。
The Windows scheduler runs at either 10ms or 16ms intervals by default depending on the processor. If you use the timeBeginPeriod() API you can change this interval (at a fairly significant power consumption cost).
In Windows XP and Windows 7, the wave APIs run with a latency of about 30ms, for Windows Vista the wave APIs have a latency of about 50ms. You then need to add in the audio engine latency.
Unfortunately I don't have numbers for the engine latency in one direction, but we do have some numbers regarding engine latency - we ran a test that played a tone looped back through a USB audio device and measured the round-trip latency (render to capture). On Vista the round trip latency was about 80ms with a variation of about 10ms. On Win7 the round trip latency was about 40ms with a variation of about 5ms. YMMV however since the amount of latency introduced by the audio hardware is different for each piece of hardware.
I have absolutely no idea what the latency was for the XP audio engine or the Win9x audio stack.
从根本上来说,Windows 是一个多线程操作系统。它以 100 毫秒的时间片来调度线程。
这意味着,如果没有 CPU 争用,缓冲区结束和 waveOutWrite 回调之间的延迟可以任意短。或者,如果还有其他繁忙的线程,则每个线程必须等待最多 100 毫秒。
然而,在最好的情况下...CPU 现在的时钟速度为 GHz。这对回调的调用速度设置了绝对下限,范围为 0.000,000,000,1 第二个数量级。
除非您能计算出一秒钟内可以处理的最大数量的 waveOutWrite 回调,这可能意味着每次调用的延迟,我认为实际上,大多数时候延迟将低于预期几个数量级,除非有太多繁忙的线程,在这种情况下,它会出现非常非常严重的错误。
At the very basic level, Windows is a multi threaded OS. And it schedules threads with 100ms time slices.
Which means that, if there is no CPU contention, the delay between the end of the buffer and the waveOutWrite callback could be arbitrailly short. Or, if there are other busy threads, you have to wait up to 100ms per thread.
In the best case however... CPU speeds clock in at the GHz now. Which puts an absolute lower bound on how fast the callback can be called in the 0.000,000,000,1 second order of magnitude.
Unless you can figure out the maximum number of waveOutWrite callbacks you can process in a single second, which could imply the latency of each call, I think that really, the latency is going to be orders of magnitude below preception most of the time, unless there are too many busy threads, in which case its going to go horribly, horribly wrong.
添加到上面的精彩答案中。
你的问题是关于Windows既没有承诺也没有关心的延迟。因此,根据操作系统版本、硬件和其他因素,它可能会有很大不同。 WaveOut API 和 DirectSound(不确定 WASAPI,但我猜对于最新的 Vista+ 音频 API 也是如此)都设置为缓冲音频输出。只要您在当前仍在播放的同时按时排队下一个缓冲区,就不需要特定的回调准确性。
当您开始音频播放时,您有一些假设,例如播放期间没有下溢,所有输出都是连续的,并且音频时钟速率与您期望的完全一致,例如精确的 44,100 Hz。然后,您可以进行简单的数学计算来及时安排波形输出,将时间转换为样本,然后转换为字节。
遗憾的是,有效播放速率并不精确,例如想象一下真实的硬件采样率可能是 44,100 Hz -3%,从长远来看,字节时间数学可能会让您失望。人们一直在尝试补偿这种影响,例如使音频硬件成为播放时钟并将视频与其同步(这就是播放器的工作方式),以及将传入数据速率与硬件上的实际播放速率相匹配的速率匹配技术。这两者都使得绝对时间测量和延迟成为一种推测性的知识。
更重要的是,API 延迟为 20 毫秒、30 毫秒、50 毫秒等。很久以前,waveOut API 是其他 API 之上的一层。这意味着在数据实际到达硬件之前会进行一些处理,并且此处理要求您提前将手从排队的数据上移开,否则数据将无法到达硬件。假设您尝试在播放时间之前将数据排队到 10 毫秒的缓冲区中,API 将接受此数据,但将这些数据传递到下游会很晚,并且扬声器上会出现静音或舒适的噪音。
现在这也与您收到的回调有关。您可以说您不关心缓冲区的延迟,对您来说重要的是精确的回调时间。然而,由于 API 是分层的,您会在内层同步的准确性上收到回调,这样第二个内层会通知空闲缓冲区,第一个内层会更新其记录并检查它是否也可以释放您的缓冲区(嘿,这些缓冲区不也不必匹配)。这使得回调准确性预期非常弱且不可靠。
假设我已经有一段时间没有接触waveOut API了,如果出现这样的同步精度问题,我可能首先想到两件事:
Windows提供了对音频硬件时钟的访问(我知道IReferenceClock 接口可通过 DirectShow 获得,它可能来自另一个也可访问的较低级别的东西),并且有了该接口可用,我将尝试与其同步
来自 Microsoft 的最新音频 API,WASAPI,为低延迟音频提供了特殊支持,并具有新的酷东西例如更好的媒体线程调度、独占模式流和 PCM 的 <10 毫秒延迟 - 这是需要考虑更好的同步的地方
To add to great answers above.
Your question is about the latency Windows neither promised not cared of. And as such, it might be quite different depending on OS version, hardware and other factors. WaveOut API, and DirectSound too (not sure about WASAPI, but I guess it is also true for this latest Vista+ audio API) are all set for buffered audio output. Specific callback accuracy is not required as long as your are on time queuing next buffer while current is still being played.
When you start audio playback, you have a few assumptions such as no underflows during playback and all output is continuous, and audio clock rate is exactly as you expect is, such as 44,100 Hz precisely. Then you do simple math to schedule your wave output in time, converting time to samples and then to bytes.
Sadly, effective playback rate is not precise, e.g. imagine real hardware sampling rate may be 44,100 Hz -3%, and in long run the time-to-byte math might be letting you down. There has been attempt to compensate for this effect, such as making audio hardware the playback clock and synchronizing video to it (this is how players work), and rate matching technique to match incoming data rate to actual playback rate on hardware. Both these make absolute time measurements and latency in question quite a speculative knowledge.
More to this, the API latencies 20 ms, 30 ms, 50 ms and so on. Since long ago waveOut API is a layer on top of other APIs. This means that some processing takes place before data actually reach the hardware and this processing requires that you put your hands off the queued data well in advance, or the data won't reach the hardware. Let's say if you attempt to queue your data in 10 ms buffers right before playback time, the API will accept this data but it will be late itself passing this data downstream, and there will be silence or comfort noise on the speakers.
Now this is also related to callbacks that you receive. You could say that you don't care about latency of buffers and what is important to you is precise callback time. However since the API is layered, you receive callback at the accuracy of inner layer synchronization, such second inner layer notifies on free buffer, and first inner layer updates its records and checks if it can release your buffer too (hey, those buffers don't have to match too). This makes callback accuracy expectations really weak and unreliable.
Provided that I have not been touching waveOut API for quite some time, if such question of synchronization accuracy would come up, I would probably first of all thought of two things:
Windows provides access to audio hardware clock (I am aware of IReferenceClock interface available through DirectShow, and it probably comes from another lower level thing which is also accessible) and having that available I would try to synchronize with it
Latest audio API from Microsoft, WASAPI, provides special support for low latency audio with new cool stuff there like better media thread scheduling, exclusive mode streams and <10 ms latency for PCM - this is where better sync is to be looked at