进行流式传输到客户端设备的远程渲染(通常用于视频游戏)的想法在概念上非常简单,除非出现交互式快节奏游戏的滞后等明显问题。
但是 - 从技术上讲你怎么能做到呢?我的理解是,流视频不仅在当前播放位置之前进行缓存,而且视频文件是通过向前查看许多帧来压缩的?
是否有库可以让您将任意“显示源”输入到服务器端视频源中,以便您可以使用常规 Flash/HTML5 组件在客户端上播放它?避免需要自定义应用程序或定制浏览器插件将是一个显着的好处......即客户端网页不知道它不是常规视频。
我想它有点像网络摄像头......但我希望“摄像头”能够“观看”渲染到服务器上的窗口的输出。
我的目标是基于 Windows 的服务器和 C++ 渲染应用程序。
The idea of doing remote rendering (typically for a video game) which is streamed to a client device is conceptually quite simple, barring obvious issues like lag for an interactive fast-paced game.
But - technically how could you do it? My understanding is that streaming video not only caches ahead of the current play-back position, but that video files are compressed by looking ahead many frames?
Are there libraries that would let you feed an arbitrary "display feed" into a serverside video-source, so that you could then play it on the client using regular Flash/HTML5 components? Avoiding the need for a custom app or bespoke browser-plugin would be a significant benefit... i.e. the client-side web-page doesn't know it's not a regular video.
It's a bit like a web-cam I suppose... but I want the 'camera' to be 'watching' the output of a window rendered to on the server.
I'm targeting Windows-based servers and C++ rendering apps.
发布评论
评论(3)
有趣的问题。有许多方面需要考虑,排名不分先后:
同时编码和流式传输
渲染电影的容器格式的选择非常重要。我认为主要的限制是渲染器只能按顺序写入文件。原因是文件需要流式传输到客户端,因此当渲染器写入文件时,将有一个 Web 服务器进程在距离 EOF 较近的地方读取该文件。渲染器无法使用随机访问来写入影片文件,因为磁盘上已有的任何数据可能已经发送到客户端,因此显然写入磁盘的所有内容都必须采用最终形式。
F4V 格式(Adobe 的 FLV 的后继者)似乎符合要求,因为它可以以流媒体友好的方式编写。该格式得到客户端广泛支持,只需安装Flash播放器插件即可。对于 iPhone/iPad,您将需要另一种不涉及 Flash 的替代方案,因此对于 iOS,您可以使用 MP4。请注意,F4V 源自 MP4,两者非常相似。
当然,服务器上运行的 3D 引擎必须能够渲染为 F4V/MP4,这可能需要为您的引擎提供自定义导出插件。
性能
您的服务器必须能够以与预期播放帧速率相同或更快的速度渲染和编码帧。硬件加速是你的朋友。
高效延迟
视频编码格式的工作原理是仅对帧之间的差异进行编码,因此要解码任何给定的帧,您可能需要先解码其他一些帧。现代编码格式最有趣的方面之一是它们不仅编码与过去帧的差异,而且还编码与未来帧的差异。这明显增加了延迟,因为编码器需要推迟对帧进行编码,直到它再接收到几帧。似乎为了减少延迟,您需要将编码的“未来”一侧限制为非常短的量,从而可能降低编码效率和/或质量。
客户端缓冲
如果您想避免使用自定义播放插件,这可能是一项困难的任务。视频播放器将流下载到通常为几秒长的缓冲区,并且仅在缓冲区已满时才开始播放。这里的想法是,完整的缓冲区有助于克服任何网络中断和速度减慢的情况。但不幸的是,大的缓冲区意味着延迟的增加。您将需要找出客户端播放器希望在播放缓冲区中保留多少秒的素材,这将确定服务器端渲染/编码过程始终需要领先多远。自定义播放插件可以减少或消除缓冲区以减少延迟,但它会对网络中断更加敏感。
HTTP 服务器支持
我不确定 HTTP 服务器如何传输由另一个进程生成的文件。我怀疑这不是常规服务器测试或打算支持的东西。有一个不太为人所知的 FTP 扩展,称为“尾部模式 FTP”,它基本上使用您想要的行为。启用尾部模式的 FTP 服务器知道文件正在增长,因此它不对大小进行假设,只是传输文件中出现的字节。如果服务器发现文件消耗速度太快并达到 EOF,甚至会等待文件增长。您可能需要一个支持类似功能的自定义 HTTP 服务器。
专用的流媒体服务器可能是一个不错的选择。感兴趣的链接是开源 Darwin Streaming Server 和 QuickTime Broadcaster 流媒体应用程序。对于 Adobe 方面,有 Adobe Streaming Server,它是一款商业产品。 Microsoft 还提供了另一个选项,即 IIS 的 Smooth Streaming 服务器扩展。
交互性
您对此没有说什么,但我想这种技术的良好应用将允许客户端将输入事件发送回服务器,然后服务器将使用该事件来影响电影的内容。这实际上是一个完全托管在服务器上的游戏引擎,只有输入和显示组件在客户端上运行。再次强调,如果延迟足够低,让应用程序感觉响应迅速,这将是一个挑战。此外,您现在还必须对每个客户端流进行编码,因为每个客户端将看到不同版本的电影。这里有很多问题,根据需要支持的并发客户端数量,可能需要渲染/编码场。具有可以组合的预渲染和预编码的动画块(采用旧的龙的风格Lair games)可能是此类应用程序的一个很好的折衷解决方案。
Interesting problem. There are a number of aspects to consider, in no particular order:
Encoding and streaming at the same time
The choice of container format for the rendered movie is very important. I think the main limitation is that the renderer is constrained to write the file sequentially. The reason is that the file needs to be streamed to clients, so while the renderer is writing the file there will be a web server process reading it at some potentially close distance from EOF. The renderer cannot use random access to write the movie file because any data that is already on disk might have already be sent to clients, so clearly everything that is written to disk has to be in final form.
It seems that the F4V format (successor of FLV from Adobe) fits the bill, as it can be written in streaming friendly fashion. This format is widely supported by clients, you just need to have the Flash player plugin installed. For iPhone/iPad you will need another alternative that does not involve Flash, so for iOS you can use MP4. Note that F4V derives from MP4, both are extremely alike.
Of course, the 3D engine running on the server will have to have the ability to render to F4V/MP4, and this may require a custom export plugin for your engine.
Performance
Your server must be able to render and encode frames at equal or faster speed than the intended playback frame rate. Hardware acceleration is your friend.
Latency
Efficient video encoding formats work by encoding only the differences between frames, so to decode any given frame you may need to decode a few others first. One of the most interesting aspects of modern encoding formats is that they not only encode differences from past frames, but also from future frames. This clearly increases latency, as the encoder needs to postpone encoding a frame until it receives a few frames more. It seems that to reduce latency you would need to limit the 'future' side of the encoding to a very short amount, and thus possibly reduce encoding efficiency and/or quality.
Client-side buffering
This is possibly a tough one if you want to avoid a custom playback plugin. Video players download streams to a buffer that is typically several seconds long, and only begin to play when the buffer is full. The idea here is that a full buffer helps ride out any network interruptions and slow downs. But unfortunately a large buffer means an increase in latency. You will need to find out how many seconds of material the client players want to have in their playback buffer, and that will determine how far ahead the server-side rendering/encoding process always needs to be. A custom playback plugin could reduce or eliminate the buffer to reduce latency, but then it will be more sensitive to network hiccups.
HTTP server support
I'm not sure how an HTTP server will like to stream a file as it is being generated by another process. I suspect this is not something that regular servers test or intend to support. There is this not very known extension to FTP called "tail-mode FTP" which basically uses the behavior you want for this. The tail-mode enabled FTP server knows the file is growing, so it makes no assumption about size and just transfers bytes as they appear in the file. The server even waits for the file to grow if it finds it is consuming the file too fast and reaches EOF. You may need a customized HTTP server that supports a similar feature.
A dedicated streaming server may be a good option here. Links of interest are the open source Darwin Streaming Server and the QuickTime Broadcaster streaming application. For the Adobe side of things there is the Adobe Streaming Server which is a commercial product. And yet another option from Microsoft, the Smooth Streaming server extension for IIS.
Interactivity
You didn't say anything about this, but I would imagine a good application of this technology would allow the client to send input events back to the server, which will then use that to affect the contents of the movie. This effectively would be a game engine that is hosted entirely on the server with only the input and display components running on the client. Once again, this will be challenging to do with low enough latency for the application to feel responsive. Also you will now be having to encode per-client streams, as each client will be seeing a different version of the movie. Lots of problems here, a render/encoding farm might be necessary depending on the number of simultaneous clients that need to be supported. Having pre-rendered and pre-encoded chunks of animation that can be combined (in the style of the old Dragon's Lair games) might be a good compromise solution for this type of application.
软件中可能没有有效的解决方案......但硬件中可能有: http://yhst-128017942510954.stores.yahoo.net/cube200-1ch-hdmi-enc2001.html
应该可以将该设备使用的 H.264 编码器与视频卡的成本要低得多。
There may not be an efficient solution to this problem in software... but there probably is in hardware: http://yhst-128017942510954.stores.yahoo.net/cube200-1ch-hdmi-enc2001.html
It should be possible to combine the H.264 encoder used by that device with a video card at much lower cost.
流编码视频
方法
我正在研究类似的问题,我将分享我所学到的知识。虽然我不知道如何将它们流式传输,但我确实知道如何在服务器上生成和编码多个高清视频流。我测试了两种方法: NVIDIA CUDA 视频编码(C 库)API 和tel 性能基元视频编码器。 NVIDIA 链接将带您直接查看示例。英特尔页面没有内部锚点,因此您必须搜索“视频编码器”。
测试设置
两者都将视频流(最高可达 HD)编码为 H.264。支持其他格式,但我对 H.264 感兴趣。为了测试性能,我设置了准备好的 YUV 格式的输入视频,并将其以尽可能快的速度传送给编码器。两个编码器的输出均为 1080P。
CPU 性能
在性能方面,英特尔视频编码器可以在 Xeon E5520 @ 2.27GHz 上以 0.5 倍的实时速度对单个流进行编码,负载约为 12.5%,即 8 个核心在 100% 负载时。较新的至强处理器速度更快,但我不知道它们是否可以达到实时性。
GPU 性能
GTS 450 上的 NVIDIA 编码器可以在 50% CPU 负载的情况下编码 9-10X 实时 1080P(!)。 NVIDIA 上的 CPU 负载似乎主要是将数据复制到 GPU 或从 GPU 复制数据。
GPU 解决方案特别好的一点是它可以将 GPU 渲染表面作为输入;图形在 GPU 上生成和编码,然后发送到网络。有关使用渲染表面和输入的详细信息,请参阅CUDA 示例,一本关于 GPU 编程的优秀而直接的书。在这种情况下,我预计 CPU 负载会下降大约一半。由于对于实时图形来说,速度比实时速度更快是没有意义的,因此您可能可以使用足够的 GPU 资源(例如两个 GTS 450 卡)从渲染表面编码 8 个以上的流,如果可以接受低于 1080P 的分辨率,则可能需要更多。
Stream Encoding Video
Approaches
I'm working on a similar problem and I'll share what I've learned. While I don't know how to stream them out, I do know how to generate and encode multiple HD video streams on the server. I've tested two approaches: NVIDIA CUDA Video Encode (C Library) API and Intel Performance Primitives Video Encoder. The NVIDIA link takes you right to the example. The Intel page does not have internal anchors so you'll have to search for "Video Encoder".
Test Setup
Both encode video streams, up to and inlcluding HD, to H.264. Other formats are supported, but I am interested in H.264. To test performance, I setup prepared input video, in YUV format, and fed it to the encoders as fast as they would take it. Output from both encoders was 1080P.
CPU Performance
Performance wise, the Intel video encoder could encode a single stream at 0.5X real time with about a 12.5% load on a Xeon E5520 @ 2.27GHz, i.e. one core of eight at 100% load. Newer Xeons are much faster, but I don't know if they can hit real-time yet.
GPU Performance
The NVIDIA encoder on a GTS 450, could encode 9-10X real-time 1080P(!) with a 50% CPU load. The CPU load on the NVIDIA appear to be primarily copying data to-and-from the GPU.
What is particularly nice about the GPU solution is that it can take a GPU render surface as input; graphics are generated and encoded on the GPU, only leaving to go out to the network. For details on using a render surface and an input, see CUDA by Example, an excellent and straight-forward book on GPU programming. In that case I would expect CPU load to drop by approximately half. Since there is no point in going faster than real-time for real-time graphics, you could likely encode 8+ streams from render surfaces with adequate GPU resources, e.g. two GTS 450 cards, perhaps many more if resolution lower than 1080P is acceptable.