我正在更新一个应用程序,其中测量屏幕上刺激呈现的时间需要最大程度的准确性。目前它是用 DirectDraw 编写的,而 DirectDraw 很久以前就被淘汰了,因此需要更新我们的图形库。
我们测量呈现时间的方法是利用检测垂直空白期的结束。具体来说,我需要以最大可能的准确度知道,何时翻转到主表面(或呈现在交换链中)的任何内容实际上是由屏幕绘制的。检测扫描线可以增加该测量的确定性,但我只能检测调用 Flip 或 Present 后垂直空白期何时结束。
Direct 3D 9 具有 IDirect3DDevice9::GetRasterStatus 方法 返回一个 D3DRASTER_STATUS 结构,其中包含一个 InVBlank 布尔值,描述设备是否处于垂直空白以及当前扫描线。 DirectDraw 具有类似的功能(IDirectDraw::GetVerticalBlankStatus,还有 IDirectDraw::GetScanLine在 Vertical Blank 期间返回 DDERR_VERTICALBLANKINPROGRESS 可用于检测 VB)。
但是我在Direct3D11中没有找到任何类似的功能。有谁知道此功能是否在 Direct3D9 和 Direct3D11 之间移动或删除,如果是后者,为什么?
I'm updating an application in which measurement of the time of presentation of a stimulus on a screen requires the greatest amount of accuracy. It is currently written with DirectDraw, which got put out to pasture a long while ago, and there's a need to update our graphics library.
The way which we measure the presentation time utilizes detecting the end of the Vertical Blank period. Specifically I need to know with, the greatest possible accuracy, when whatever was flipped onto the primary surface (or presented in the swap chain) is actually being drawn by the screen. Detecting the scan line can increase the certainty of that measurement, but I would be able to work with only detecting when the vertical blank period ended immediately after the Flip or Present was called.
Direct 3D 9 has the IDirect3DDevice9::GetRasterStatus Method that returns a D3DRASTER_STATUS struct which includes a InVBlank boolean, that describes if the device is in a vertical blank, as well as the current scan line. DirectDraw has similar functions (IDirectDraw::GetVerticalBlankStatus, also IDirectDraw::GetScanLine which returns DDERR_VERTICALBLANKINPROGRESS during Vertical Blank can be used to detect the VB).
However I have not been able to find any similar function in Direct3D11. Does anyone know if this functionality was moved or removed between Direct3D9 and Direct3D11, and if the latter, why?
发布评论
评论(4)
抱歉回复晚了,但我注意到仍然没有被接受的答案,所以也许您从未找到有效的答案。如今在 Windows 上, DesktopWindowManager 服务 (dwm.exe) 协调一切并且无法真正被绕过。从 Windows 8 开始,该服务就无法禁用。
因此,DWM 始终会控制帧速率、渲染队列管理和所有各种 IDXGISurface(n) 对象和 IDXGIOutput(n ) 监视器,并且跟踪屏幕外渲染目标的 VSync 没有太大用处,除非我遗漏了某些东西(无意讽刺)。至于你的问题,我不确定你的目标是否是:
如果是后者,我相信只有当 D3D 应用程序在全屏独占模式下运行时,您才能有效地执行此操作。这是 DWM 以 DXGI - 将真正信任客户端来处理其自己的
当前
计时。这里的(勉强)好消息是,如果您对 VSync 的兴趣仅供参考,也就是说您属于上面的项目符号类别 (1.),那么您确实可以获得所有时间您想要的数据,请访问 QueryPerformanceFrequency 分辨率,通常约为 320 ns。 ¹
以下是如何获取高分辨率视频计时信息。但再次需要明确的是,尽管在获取如下所示的信息方面取得了明显的成功,但任何使用这些有趣结果的尝试(例如,根据您获得的读数来确定某些确定性的(因此可能有用的)结果)都会注定会失败,也就是说,完全被 DWM 中介所阻碍:
(注意:要水平压缩上述源代码以在本网站上显示,请假设前面添加了以下缩写:)
现在,对于在窗口模式下运行的应用程序,您当然可以随时获取此详细信息喜欢。如果您只需要它进行被动分析,则从 DwmGetCompositionTimingInfo 是现代的方法。
说到现代,由于问题暗示了现代化,您需要考虑使用 IDXGISwapChain1 从 IDXGIFactory2::CreateSwapChainForComposition 以启用使用新的 DirectComposition 组件。
无论如何,详细的计时信息似乎不太可能有效地告知应用程序的运行时行为; 也许它会帮助您预测下一个垂直同步,但是确实想知道“对消隐期的敏锐认识”对于某些特定的情况可能有什么意义DWM 征服的屏幕外交换链。
因为您的应用程序的表面只是 DWM 所处理的众多问题之一,所以 DWM 将在每个客户端行为一致的假设下进行自己的各种动态调整。在这样的政权中,不可预测的适应是不合作的,并且很可能最终会让双方都感到困惑。
Notes:
1. The resolution of QPC is many orders of magnitude higher than that of the
DateTime
tick, despite the the latter's suggestive use of a 100 ns. unit denomination. Think ofDateTime.Now.Ticks
as a repackaging of the (millisecond-denoted)Environment.TickCount
, but converted to 100-ns units. For the highest possible resolution, use static methodStopwatch.GetTimestamp()
instead ofDateTime.Now.Ticks
.Sorry for the late reply, but I notice there is still no accepted answer so perhaps you never found one that worked. Nowadays on Windows, the DesktopWindowManager service (dwm.exe) coordinates everything and can't really be bypassed. Ever since Windows 8, this service can't be disabled.
So DWM is always going to control the frame rate, render queue management, and final composition for all of the various IDXGISurface(n) objects and IDXGIOutput(n) monitors and there isn't much use in tracking VSync for an offscreen render target, unless I'm missing something (no sarcasm intended). As for your question, I wasn't sure if your goal was to:
If it's the latter, I believe you can effectively only do this if the D3D app is running in full-screen exclusive mode. That's the only case where the DWM—in the guise of DXGI–will truly trust a client to handle its own
Present
timing.The (barely) good news here is that if your interest in VSync is informational only—which is to say that you fall into bullet category (1.) from above—then you can indeed get all the timing data you'd ever want, and at QueryPerformanceFrequency resolution, which is typically around 320 ns.¹
Here's how to get that high-res video timing info. But again, just to be clear, despite the apparent success in obtaining the information as shown below, any attempt to use these interesting results, for example, to condition some deterministic--and thus potentially useful--outcome on the readings you obtain will be destined to fail, that is, entirely thwarted by DWM intermediation:
(Note: To horizontally compress the above source code for display on this website, assume the following abbreviations are prepended:)
Now for apps running in windowed mode, you can certainly grab this detailed information as often as you like. If you only need it for passive profiling, then getting the data from DwmGetCompositionTimingInfo is the modern way to do it.
And speaking of modern, since the question hinted at modernizing, you'll want to consider using a IDXGISwapChain1 obtained from IDXGIFactory2::CreateSwapChainForComposition to enable the use of the new DirectComposition component.
Anyway, it seems less likely that detailed timing information might usefully inform an app's runtime behavior; maybe it will help you predict your next VSync, but one does wonder what significance "keen awareness of the blanking period" might have for some particular DWM-subjugated offscreen swap chain.
Because your app's surface is just one of many that the DWM is juggling, the DWM is going to be doing all kinds of dynamic adaptation of its own, under an assumption of each client behaving consistently. Unpredictable adaptations are uncooperative in such a regime, and will likely just end up confounding both parties.
Notes:
1. The resolution of QPC is many orders of magnitude higher than that of the
DateTime
tick, despite the the latter's suggestive use of a 100 ns. unit denomination. Think ofDateTime.Now.Ticks
as a repackaging of the (millisecond-denoted)Environment.TickCount
, but converted to 100-ns units. For the highest possible resolution, use static methodStopwatch.GetTimestamp()
instead ofDateTime.Now.Ticks
.另一种选择:
有 D3DKMTGetScanLine( ) 适用于 D3D9、D3D10、D3D11、D3D12 和甚至OpenGL。
它实际上是一个 GDI32 函数,因此您可以利用 Window 的现有图形 hAdaptor 来轮询 VBlank/Scanline - 无需创建 Direct3D 帧缓冲区。这就是为什么这个 API 也可以与 OpenGL、Mantle 和非 Direct3D 渲染器一起正常工作,尽管这个 API 调用有 D3D 前缀。
它还会告诉您 VBlank 状态 &光栅扫描线。
它对于最高“延迟至关重要”的应用程序中的波束竞赛应用程序非常有用。一些虚拟现实渲染使用光束竞赛,即使只有 20 毫秒的延迟也可能意味着令人愉悦的 VR 和令人眼花缭乱/令人作呕的 VR 之间的差异。
Beam racing 是在显示器扫描输出后即时渲染的。在专门的延迟关键型应用程序中,您可以将从 Direct3D Present() 到触及您眼球的像素的延迟减少到绝对最低(低至 3 毫秒)。
要了解光束竞速是什么,https://www.wired.com/ 2009/03/racing-the-beam/ - 在图形芯片没有帧缓冲区的时代,这种情况很常见 - 使得光束竞速对于改善 Atari 2600、Nintendo、Commodore 上的图形效果是必要的64 等...
有关光束竞赛的更现代的实现,请参阅 模拟器的无延迟 VSYNC ON 算法。
Another alternative:
There's D3DKMTGetScanLine() which works with D3D9, D3D10, D3D11, D3D12, and even OpenGL.
It's actually a GDI32 function so you piggyback off the Window's existing graphics hAdaptor to poll the VBlank/Scanline -- no need to create a Direct3D frame buffer. That's why this API works fine with OpenGL, Mantle, and non-Direct3D renderers too, despite the D3D prefix of this API call.
It also tells you VBlank status & Raster scan line.
It's useful for beam-racing applications in supreme "latency-is-critical" applications. Some virtual reality renders use beam racing, when even a mere 20ms of lag can mean the difference between pleasant VR and dizzying/pukeworthy VR.
Beam racing is rendering on the fly, following the scanout of a display. In speciallized latency-critical applications, you can reduce latency from Direct3D Present() to pixels hitting your eyeballs, to absolute minimum (as little as 3ms).
To understand what beam racing is, https://www.wired.com/2009/03/racing-the-beam/ -- it was common back in the day when graphics chips had no frame buffers -- making beam racing necessary for improved graphics on Atari 2600, Nintendo, Commodore 64, etc...
For a more modern implementation of beam racing, see Lagless VSYNC ON Algorithm for Emulators.
祝你好运。
实际上并不能保证您放入当前队列的任何内容都会显示在屏幕上(!!);您可以手动丢弃带有缓冲区排序当前标志的帧,或者 NVIDIA 可以为您做到这一点(...谢谢?)
DXGI 中的缓冲区排序
DXGI Swapchain 的翻转队列通常是 FIFO,但流行的新驱动程序关心延迟的用户肯定会启用覆盖(即 FastSync),有利于 CPU 端吞吐量,而不是显示您绘制的任何帧之类的琐碎事情:)
通常您可以依靠IDXGISwapChain::Present (...) 当交换链充满未显示的图像且驱动程序 在 GPU 之前暂存命令 n 多帧,但使用 FastSync在强制情况下,Present 永远不会阻塞,并且提前渲染队列会通过覆盖交换链中等待 VBLANK 的任何已完成帧来刷新其工作。
除非您自己实施速率限制以防止 CPU 在调用 Present 后立即暂存下一帧,否则您需要一个不同的范例来测量帧状态。
D3D9Ex / DXGI 支持翻转/全屏独占的演示统计:
框架实际上不会呈现给用户,除非以下 API 说明它们会呈现:
IDXGISwapChain::GetFrameStatistics (...) 和 IDXGISwapChain::GetLastPresentCount (...)
Good luck.
There is actually no guarantee that anything you put into the present queue will ever be shown on screen (!!); you can manually drop frames w/ buffer sequencing present flags, or NVIDIA can do it for you (... thanks?)
Buffer Sequencing in DXGI
The DXGI Swapchain's flip queue is generally FIFO, but popular new driver overrides (i.e. FastSync) that users concerned with latency will most assuredly have enabled, favor CPU-side throughput over such trivial things as displaying any of the frames you draw :)
Normally you could count on IDXGISwapChain::Present (...) to begin blocking when the swapchain is full of undisplayed images and the driver is staging commands n-many frames ahead of the GPU, but with FastSync forced, Present never blocks and the render-ahead-queue flushes its work by overwriting any completed frames in the Swapchain that are waiting on VBLANK.
Unless you implement rate limiting yourself to prevent the CPU from immediately staging the next frame after any call to Present, you need a different paradigm for measuring frame status altogether.
D3D9Ex / DXGI Supports Presentation Statistics in Flip / Fullscreen Exclusive:
Frames do not actually present to a user unless the following APIs say they do:
IDXGISwapChain::GetFrameStatistics (...) and IDXGISwapChain::GetLastPresentCount (...)
这里的问题是为什么?您似乎想要解决问题的一个症状;也许这会分散你对真正问题的注意力。在 Amiga 或 DOS 上,等待垂直同步是一项有用的技术。这在任何合成或多线程操作系统上都是完全错误的。
首先,你想实现什么目标?无撕裂渲染是通过在 D3D 或 OpenGL 上设置交换间隔来完成的。试图比那里的操作系统做得更好是有害的。只需考虑多个显示器等情况,或者如果多个应用程序尝试同步会发生什么情况。
如果您是某个其他进程的客户端并希望在 VSync 上运行计时,不幸的是,据我所知,Windows 没有提供等待对象。最好的选择是仍然依赖 Present 调用并估计正在发生的情况。
有两种情况:您的渲染(呈现)速度比垂直同步更快或更慢。如果你速度更快,Present 应该已经为你挡住了。如果 Present 从不等待,并且调用之间的时间超过 1/60 秒,则您可能希望减少渲染频率。
人们关心垂直同步的最常见情况是视频。您可以比垂直同步更快地渲染,但需要等待合适的时间来呈现。唯一要做的就是尽可能快地运行几帧,并据此估计帧时间。使用一些抖动和反馈...或者使用内置的硬件视频,它很乐意成为视频驱动程序的内核朋友。
The question here is why? It looks like you want to solve a symptom of your issue; maybe that's a distraction from your real issue. Waiting for vsync was a useful technique on Amiga or DOS. It is totally wrong on any compositing or multithreading OS.
First, what do you want to achieve? Tearing-free rendering is done by setting a swap interval on either D3D or OpenGL. It is harmful to try to do better than the OS there. Just think about cases like multiple monitors or what happens if more than one app tries to sync.
If you are a client to some other process and want to run your timing on VSync, Windows unfortunately offers no object to wait on as far as I know. Your best bet is to still rely on the Present call and estimate what is happening.
There are two cases: You are either rendering (presenting) faster or slower than vsync. If you are faster, Present should block for you already. If present never waits and your time between calls is more than 1/60 sec., you probably want to render less often.
The most common case why people care about VSync is video. You can render a lot faster than vsync but want to wait for just the right time to present. The only thing to do there is to run a few frames as fast as you can and from that estimate you frame timing. Use some jitter and feedback... or use built in hardware video that is happy enough to be kernel friends with the video driver.