3D音频引擎
尽管 3D 图形引擎取得了如此多的进步,但令我感到奇怪的是,音频却没有受到同等程度的关注。 现代游戏可以实时渲染 3D 场景,但我们仍然会或多或少地获得伴随这些场景的预录制音频。
想象一下 - 如果您愿意的话 - 一个 3D 引擎不仅可以模拟物品的物理外观,还可以模拟它们的音频属性。 通过这些模型,它可以根据接触的材料、速度、与虚拟耳朵的距离等动态生成音频。 现在,当你蹲在沙袋后面,子弹从你头顶飞过时,每一颗都会发出独特而真实的声音。
这种技术最明显的应用是游戏,但我确信还有许多其他可能性。
是否正在积极开发这样的技术? 有谁知道有哪些项目试图实现这一目标?
谢谢, 肯特
Despite all the advances in 3D graphic engines, it strikes me as odd that the same level of attention hasn't been given to audio. Modern games do real-time rendering of 3D scenes, yet we still get more-or-less pre-canned audio accompanying those scenes.
Imagine - if you will - a 3D engine that models not just the physical appearance of items, but also their audio properties. And from these models it can dynamically generate audio based on the materials that come into contact, their velocity, distance from your virtual ears, etcetera. Now, when you're crouching behind the sandbags with bullets flying over your head, each one will yield a unique and realistic sound.
The obvious application of such a technology would be gaming, but I'm sure there are many other possibilities.
Is such a technology being actively developed? Does anyone know of any projects that attempt to achieve this?
Thanks,
Kent
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我曾经做过一些关于改进 OpenAL 的研究,模拟 3D 音频的问题在于,你的大脑使用的许多线索——不同角度下略有不同的衰减、你前面和后面的声音之间的频率差异——对于您自己的头脑来说非常具体,并且对于其他人来说并不完全相同!
比如说,如果你想要一副耳机来真正让它听起来像是一个生物在游戏中角色前面的树叶中,那么你实际上必须带该玩家进入工作室,测量他们自己的特定情况耳朵和头部在不同距离改变声音的幅度和相位(幅度和相位不同,并且对于大脑处理声音方向的方式都非常重要),然后教游戏对声音进行衰减和相移那个特定的玩家。
确实存在用塑料模拟的“标准头部”,用于获得头部周围各个方向的通用频率响应曲线,但平均或标准对大多数玩家来说永远不会听起来很正确。
因此,当前的技术基本上是向玩家出售五个廉价扬声器,让他们将它们放置在桌子周围,然后声音(虽然再现得不是特别好)实际上听起来像是从玩家后面或旁边传来的,因为,好吧,它们来自播放器后面的扬声器。 :-)
但有些游戏确实费心去计算声音如何通过墙壁和门被消音和衰减(这可能很难模拟,因为耳朵通过各种材料和反射以几毫秒的不同延迟接收到相同的声音)环境中的表面,如果事情听起来很真实,所有这些都必须包括在内)。 然而,他们倾向于将自己的库保密,因此像 OpenAL 这样的公共参考实现往往非常原始。
编辑:这是我当时找到的在线数据集的链接,可以用作创建更真实的 OpenAL 声场的起点,来自 MIT:
http://sound.media.mit.edu/resources/KEMAR.html
尽情享受! :-)
I once did some research toward improving OpenAL, and the problem with simulating 3D audio is that so many of the cues that your mind uses — the slightly different attenuation at various angles, the frequency difference between sounds in front of you and those behind you — are quite specific to your own head and are not quite the same for anyone else!
If you want, say, a pair of headphones to really make it sound like a creature is in the leaves ahead and in front of the character in a game, then you actually have to take that player into a studio, measure how their own particular ears and head change the amplitude and phase of the sound at different distances (amplitude and phase are different, and are both quite important to the way your brain processes sound direction), and then teach the game to attenuate and phase-shift the sounds for that particular player.
There do exist "standard heads" that have been mocked up with plastic and used to get generic frequency-response curves for the various directions around the head, but an average or standard will never sound quite right to most players.
Thus the current technology is basically to sell the player five cheap speakers, have them place them around their desk, and then the sounds — while not particularly well reproduced — actually do sound like they're coming from behind or beside the player because, well, they are coming from the speaker behind the player. :-)
But some games do bother to be careful to compute how sound would be muffled and attenuated through walls and doors (which can get difficult to simulate, because the ear receives the same sound at a few milliseconds different delay through various materials and reflective surfaces in the environment, all of which would have to be included if things were to sound realistic). They tend to keep their libraries under wraps, however, so public reference implementations like OpenAL tend to be pretty primitive.
Edit: here is a link to an online data set that I found at the time, that could be used as a starting point for creating a more realistic OpenAL sound field, from MIT:
http://sound.media.mit.edu/resources/KEMAR.html
Enjoy! :-)
Aureal 在 1998 年就这样做了。我仍然拥有他们的一张卡,尽管我需要 Windows 98 才能运行它。
想象一下光线追踪,但带有音频。 使用 Aureal API 的游戏将提供几何环境信息(例如 3D 地图),并且声卡将进行光线追踪声音。 这完全就像听到周围世界的真实事物一样。 您可以将目光集中在声源上,并在嘈杂的环境中关注特定的声源。
据我了解,Creative通过一系列专利侵权索赔(均被驳回),通过法律费用的方式摧毁了Aureal。
在公共领域,存在 OpenAL——OpenGL 的音频版本。 我认为发展很久以前就停止了。 他们有一个非常简单的 3D 音频方法,没有几何形状 - 在软件方面并不比 EAX 更好。
EAX 4.0(我认为还有更高版本?)最后 - 十年后 - 我认为已经融入了 Aureal 使用的一些几何信息光线追踪方法(Creative 在折叠后收购了他们的 IP) )。
Aureal did this back in 1998. I still have one of their cards, although I'd need Windows 98 to run it.
Imagine ray-tracing, but with audio. A game using the Aureal API would provide geometric environment information (e.g. a 3D map) and the audio card would ray-trace sound. It was exactly like hearing real things in the world around you. You could focus your eyes on the sound sources and attend to given sources in a noisy environment.
As I understand it, Creative destroyed Aureal by means of legal expenses in a series of patent infringement claims (which were all rejected).
In the public domain world, OpenAL exists - an audio version of OpenGL. I think development stopped a long time ago. They had a very simple 3D audio approach, no geometry - no better than EAX in software.
EAX 4.0 (and I think there is a later version?) finally - after a decade - I think have incoporated some of the geometric information ray-tracing approach Aureal used (Creative bought up their IP after they folded).
SoundBlaster X-Fi 上的 Source(半条命 2)引擎已经做到了这一点。
这确实是值得一听的事情。 您绝对可以听到混凝土回声、木材回声、玻璃回声等之间的区别......
The Source (Half-Life 2) engine on the SoundBlaster X-Fi already does this.
It really is something to hear. You can definitely hear the difference between an echo against concrete vs wood vs glass, etc...
VoIP 是一个鲜为人知的领域。 虽然游戏有积极开发的软件,但您在玩游戏时也可能会花时间与其他人交谈。
Mumble ( http://mumble.sourceforge.net/ ) 是使用插件来确定谁在游戏中的软件与你。 然后,它会将其音频定位在您周围 360 度的区域,因此左侧是左侧,您身后的声音就像这样。 这增加了令人毛骨悚然的现实感,在尝试时,它导致了有趣的“马可,马球”游戏。
音频在 Vista 中发生了巨大的逆转,不再允许使用硬件来加速它。 这就像在 XP 时代一样杀死了 EAX。 现在正在逐步构建软件包装器。
A little known side area is voip. While games are having actively developed software, you are likely to spent time talking to others while you are gaming as well.
Mumble ( http://mumble.sourceforge.net/ ) is software that uses plugins to determine who is ingame with you. It will then position its audio in a 360 degree area around you, so the left is to the left, behind you sounds like as such. This made a creepily realistic addition, and while trying it out it led to funny games of "marko, polo".
Audio took a massive back turn in vista, where hardware was not allowed to be used to accelerate it anymore. This killed EAX as it was in the XP days. Software wrappers are gradually getting built now.
确实是非常有趣的领域。 太有趣了,我将就这个主题撰写硕士学位论文。 特别是,它在第一人称射击游戏中使用。
到目前为止,我的文献研究已经清楚地表明,这个特定领域几乎没有理论背景。 该领域的研究并不多,大多数理论都是基于电影音频理论。
至于实际应用,目前我还没有发现。 当然,有很多标题和软件包支持实时音频效果处理,并根据审核员的一般环境进行应用。 例如:审计员进入大厅,因此对声音样本应用回声/混响效果。 这是相当粗糙的。 视觉效果的一个类比是,当有人关闭(或射击;))房间中五个灯泡之一时,减去整个图像的 RGB 值的 20%。 这是一个开始,但一点也不现实。
我发现的最好的工作是怀卡托大学 Mark Nicholas Grimshaw 的一篇 (2007) 博士论文,名为 第一人称射击游戏的声学生态
这个巨大的寻呼机为这样的引擎提出了理论设置,并制定了大量用于分析游戏音频的分类法和术语。 他还认为,音频对于第一人称射击游戏的重要性被极大地忽视了,因为音频是进入游戏世界的强大力量。
考虑一下。 想象一下在显示器上玩游戏,没有声音,但图形完美。 接下来,想象一下,闭上眼睛,听到周围真实的游戏声音。 后者会给你一种更大的“身临其境”的感觉。
那么为什么游戏开发者还没有全心投入这个领域呢? 我认为答案很明确:销售更加困难。 改进后的图像很容易出售:您只需提供图片或电影,就很容易看出它有多漂亮。 它甚至很容易量化(例如更多的像素=更好的图片)。 对于声音来说,这并不是那么容易。 声音的真实感更多的是潜意识的,因此更难推销。
现实世界对声音的影响是潜意识感知的。 大多数人甚至从未注意到其中的大多数。 其中一些影响甚至无法有意识地听到。 尽管如此,它们都在声音的真实感中发挥着作用。 您可以自己做一个简单的实验来说明这一点。 下次你走在人行道上时,仔细聆听环境的背景声音:风吹过树叶、远处道路上的所有汽车等。然后,听听当你走近或远离时,这种声音如何变化墙壁,或者当您走过悬垂的阳台时,甚至当您经过一扇开着的门时。 这样做,仔细听,你会发现声音有很大的不同。 可能比你记忆中的要大得多。
在游戏世界中,这些类型的变化不会得到反映。 即使你(还)没有有意识地想念它们,但你的潜意识里却会想念它们,这会对你的涌现水平产生负面影响。
那么,与图像相比,音频必须有多好? 更实用:现实世界中的哪些物理效果对感知的真实感贡献最大。 这种感知到的真实感是否取决于声音和/或情况? 这些是我希望通过我的研究来回答的问题。 之后,我的想法是为音频引擎设计一个实用的框架,它可以根据(动态)可用计算能力的大小,对部分或全部游戏音频不同地应用一些效果。 是的,我设定的标准相当高:)
我将从 2009 年 9 月开始。如果有人感兴趣,我正在考虑建立一个博客来分享我的进展和发现。
珍妮·卢
(荷兰莱顿大学计算机科学学士)
Very interesting field indeed. So interesting, that I'm going to do my master's degree thesis on this subject. In particular, it's use in first person shooters.
My literature research so far has made it clear that this particular field has little theoretical background. Not a lot of research has been done in this field, and most theory is based on movie-audio theory.
As for practical applications, I haven't found any so far. Of course, there are plenty titles and packages which support real-time audio-effect processing and apply them depending on the general surroundings of the auditor. e.g.: auditor enters a hall, so a echo/reverb effect is applied on the sound samples. This is rather crude. An analogy for visuals would be to subtract 20% of the RGB-value of the entire image when someone turns off (or shoots ;) ) one of five lightbulbs in the room. It's a start, but not very realisic at all.
The best work I found was a (2007) PhD thesis by Mark Nicholas Grimshaw, University of Waikato , called The Accoustic Ecology of the First-Person Shooter
This huge pager proposes a theoretical setup for such an engine, as well as formulating a wealth of taxonomies and terms for analysing game-audio. Also he argues that the importance of audio for first person shooters is greatly overlooked, as audio is a powerful force for emergence into the game world.
Just think about it. Imagine playing a game on a monitor with no sound but picture perfect graphics. Next, imagine hearing game realisic (game) sounds all around you, while closing your eyes. The latter will give you a much greater sense of 'being there'.
So why haven't game developers dove into this full-hearted already? I think the answer to that is clear: it's much harder to sell. Improved images is easy to sell: you just give a picture or movie and it's easy to see how much prettier it is. It's even easily quantifyable (e.g. more pixels=better picture). For sound it's not so easy. Realism in sound is much more sub-conscious, and therefor harder to market.
The effects the real world has on sounds are subconsciously percieved. Most people never even notice most of them. Some of these effects cannot even conciously be heard. Still, they all play a part in the percieved realism of the sound. There is an easy experiment you can do yourself which illustrates this. Next time you're walking on the sidewalk, listen carefully to the background sounds of the enviroment: wind blowing through leaves, all the cars on distant roads, etc.. Then, listen to how this sound changes when you walk nearer or further from a wall, or when you walk under an overhanging balcony, or when you pass an open door even. Do it, listen carefully, and you'll notice a big difference in sound. Probably much bigger than you ever remembered.
In a game world, these type of changes aren't reflected. And even though you don't (yet) consciously miss them, your subconsciously do, and this will have a negative effect on your level of emergence.
So, how good does audio have to be in comparison to the image? More practical: which physical effects in the real world contribute the most to the percieved realism. Does this percieved realism depend on the sound and/or the situation? These are the questions I wish to answer with my research. After that, my idea is to design a practical framework for an audio engine which could variably apply some effects to some or all game audio, depending (dynamically) on the amount of available computing power. Yup, I'm setting the bar pretty high :)
I'll be starting per September 2009. If anyone's interested, I'm thinking about setting up a blog to share my progress and findings.
Janne Louw
(BSc Computer Sciences Universiteit Leiden, The Netherlands)