分析音频以自动创建吉他英雄关卡
我正在尝试创建一个类似吉他英雄的游戏(类似于 this),我希望能够分析用户给出的音频文件并自动创建级别,但我不知道该怎么做。
我想也许我应该使用 BPM 检测算法,并在节拍上放置一个箭头,在某些循环模式上放置一个轨道,但我不知道如何实现这些。
另外,我使用 NAudio 的 BlockAlignReductionStream,它有一个复制 byte[] 数据的 Read 方法,但是当我读取 2 通道音频文件时会发生什么?它是否从第一个通道读取 1 个字节,从第二个通道读取 1 个字节? (因为它说的是 16 位 PCM)并且 24 位和 32 位浮点也会发生同样的情况吗?
I'm trying to create a Guitar-Hero-like game (something like this) and I want to be able to analyze an audio file given by the user and create levels automatically, but I am not sure how to do that.
I thought maybe I should use BPM detection algorithm and place an arrow on a beat and a rail on some recurrent pattern, but I have no idea how to implement those.
Also, I'm using NAudio's BlockAlignReductionStream which has a Read method that copys byte[] data, but what happens when I read a 2-channels audio file? does it read 1 byte from the first channel and 1 byte from the second? (because it says 16-bit PCM) and does the same happen with 24-bit and 32-bit float?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
节拍检测(或更具体地说是 BPM 检测)
使用梳状滤波器的节拍检测算法概述:
看起来他们是这样做的:
您必须在此处实现许多算法。不过,梳状滤波器据说速度很慢。 wiki 文章没有向我指出其他具体方法。
编辑:本文提供了有关节拍检测的流统计方法的信息。这听起来是个好主意:http://www.flipcode.com/misc/BeatDetectionAlgorithms.pdf< /a> - 我敢打赌它们实时运行得更好,尽管不太准确。
顺便说一句,我只是浏览了一下并提取了关键词。我只玩过 FFT、整流和衰减滤波器(低通滤波器)。其余的我不知道,但你有链接。
这将为您提供歌曲的 BPM,但不会为您生成箭头。
关卡生成
至于“在节拍上放置箭头,在某些循环模式上放置轨道”,要实现良好的结果会有点棘手。
您可以采用更积极的内容提取方法,并尝试从歌曲中提取音符。
您还需要对这部分使用节拍检测。这可能与上面的 BPM 检测类似,但范围不同,并且使用针对仪器范围的带通滤波器。您还可以替换或删除算法的某些部分,并且必须对整首歌曲进行采样,因为您没有检测到全局 BPM。您还需要某种音调检测。
我认为这种方法会很混乱,并且保证您需要手动擦洗每首歌曲的结果。如果您对此感到满意,并且只是想避免最初的手工转录工作,那么这可能会很有效。
您还可以尝试采用内容生成方法。
大多数程序内容生成都是通过反复试验的方式完成的,人们发布的算法或申请的专利并不完全糟糕。通常,无法对内容生成算法进行真正的定性分析,因为它们会产生美感。因此,您只需选择那些似乎能提供令人满意的示例结果的样本并进行尝试。
大多数算法都以视觉内容生成为中心,包括地形、建筑、人形、植物等。有一些关于音频内容生成的研究,生成音乐等。您的要求并不完全符合其中任何一个。
我认为程序“舞步”的算法(如果存在这样的东西 - 我只找到动画技术)或生成音乐将是最接近的匹配,如果由您在歌曲中检测到的节奏驱动的话。
如果您想采用合成生成方法,请为许多完全不同的算法做好准备,这些算法通常只是暗示,但没有详细解释。
例如:
Beat detection (or more specifically BPM detection)
Beat detection algorithm overview for using a comb filter:
Looks like they do:
Lots of algorithms you'll have to implement here. Comb filters are supposedly slow, though. The wiki article didn't point me at other specific methods.
Edit: This article has information on streaming statistical methods of beat detection. That sounds like a great idea: http://www.flipcode.com/misc/BeatDetectionAlgorithms.pdf - I'm betting they run better in real time, though are less accurate.
BTW I just skimmed and pulled out keywords. I've only toyed with FFT, rectification, and attenuation filters (low-pass filter). The rest I have no clue about, but you've got links.
This will all get you the BPM of the song, but it won't generate your arrows for you.
Level generation
As for "place an arrow on a beat and a rail on some recurrent pattern", that is going to be a bit trickier to implement to get good results.
You could go with a more aggressive content extraction approach, and try to pull the notes out of the song.
You'd need to use beat detection for this part too. This may be similar to BPM detection above, but at a different range, with a band-pass filter for the instrument range. You also would swap out or remove some parts of the algorithm, and would have to sample the whole song since you're not detecting a global BPM. You'd also need some sort of pitch detection.
I think this approach will be messy and will guarantee you need to hand-scrub the results for every song. If you're okay with this, and just want to avoid the initial hand transcription work, this will probably work well.
You could also try to go with a content generation approach.
Most procedural content generation has been done in a trial-and-error manner, with people publishing or patenting algorithms that don't completely suck. Often there is no real qualitative analysis that can be done on content generation algorithms because they generate aesthetics. So you'd just have to pick ones that seem to give pleasing sample results and try it out.
Most algorithms are centered around visual content generation, including terrain, architecture, humanoids, plants etc. There is some research on audio content generation, Generative Music, etc. Your requirements don't perfectly match either of these.
I think algorithms for procedural "dance steps" (if such a thing exists - I only found animation techniques) or Generative Music would be the closest match, if driven by the rhythms you detect in the song.
If you want to go down the composition generation approach, be prepared for a lot of completely different algorithms that are usually just hinted about, but not explained in detail.
E.g.: