以编程方式“监听”声音(信号处理?)

发布于 2024-08-09 20:50:43 字数 1020 浏览 9 评论 0原文

我熟悉计算机视觉嗯,了解一下 ),其中一个应用可以是图像识别,比如光学字符识别,我相信。然而,我更感兴趣的是“计算机监听”,我刚刚了解到它被认为是数字信号处理

关于信号处理最让我感兴趣的是它在音乐中的潜在应用。我记得不久前我看到了一个应用程序的预览(抱歉,忘了名字),它可以听某人弹吉他的录音,并自动将其绘制在时间线上实际演奏的音符/和弦。使用该程序,用户可以移动它们,甚至可以编辑它们。现在,显然这要复杂得多,但它涉及同样的事情吗?信号处理?我还对音乐可视化和智能照明系统中的可能应用感兴趣。

我的理解是,对 MP3 等压缩音频格式进行此处理不会产生与包含单独轨道的 MIDI 相同的结果(也许我误解了)。诸如 PCM 之类的未压缩格式会比 MP3 更好吗?我对声音处理一无所知,这只是我从迄今为止所读到的内容中推断出来的。

我已经看过这个问题,它有很好的答案和链接,涵盖了我的很多问题。然而,我发现的大多数链接都是理论性的,我确信这些链接都很有趣,而且考虑到我对这个主题的兴趣,绝对值得一读,但我想知道是否有任何现有的库可以促进这一点,或与该主题相关的面向计算机科学/编程的文章,可能还包含示例代码。即使是开源声音/音乐可视化工具或任何其他开源声音处理代码也会很棒。

抱歉,如果我没有任何意义。就像我说的,我不知道我在说什么。

I'm familiar with Computer Vision (Well, know OF it), of which one application can be image recognition, such as Optical Character Recognition, I believe. However, something that I am more interested in is 'computer listening', which I have just learned is considered Digital Signal Processing.

The thing that interests me the most about signal processing is the potential application in music. I remember a while ago I saw a preview of an application (Sorry, forgot the name) which could listen to a recording of someone playing a guitar, and automatically graph it out across a time-line with the actual notes/chords that were played. Using the program, the user was able to move these around and even edit them. Now, obviously this is a lot more complicated, but does it involve the same thing? Signal Processing? I am also interested in possible applications in music visualizers and intelligent lighting systems.

My understanding is that doing this processing on a compressed audio format such as MP3 wont yield the same results as MIDI which contains separate tracks (Maybe I misunderstood). Would an uncompressed format such as PCM do better than MP3? I don't know anything about sound processing, that's just what I'm inferring from what I've read so far.

I have already seen this question which has great answers and links that cover a lot of my questions. However, most of the links I've found are theoretical, which I'm sure is all interesting and is definitely worth a read given my interest in the subject, but I wanted to know if there are any existing libraries which can facilitate this, or articles pertaining to this subject that geared towards Computer Science/Programming, with perhaps example code. Even open source sound/music visualizers or any other open source sound processing code would be great.

Sorry if I didn't make any sense. Like I said, I don't know what I'm talking about.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

尝蛊 2024-08-16 20:50:43

我最感兴趣的事情
关于信号处理的是
在音乐中的潜在应用。我
记得前段时间看过预告
一个应用程序的(抱歉,忘记了
名称)

也许cubase

可以听录音
有人弹吉他,并且
自动将其绘制成图表
带有实际音符/和弦的时间线
播放过的内容

,当您演奏音符时,您会产生具有给定频率的周期波。有一种数学技巧(傅里叶变换 DFT)可以将波转换为频谱,它不是随时间呈现强度,而是随波频率呈现强度。例如,音叉发出的完美 A 音会产生 440 Hz 的振荡波。在时域中,这将显示为正弦波。在频域中,它将显示为以 440 Hz 为中心的单个窄尖峰。

现在,当您弹奏吉他时,您不会产生完美的正弦波。敲击 A 将产生基频 440 Hz,但也会产生许多附加频率(例如 880,高八度,但也会产生许多其他更高和更低的频率),这是由于振动弦的物理原理、材料这些附加频率称为谐波,它们与基音混合以产生“吉他的声音”(音乐术语中称为音色)。不同的乐器(比如钢琴)会有不同的和声与基音的混合,产生不同的音色。

DSP程序所做的就是对输入信号执行DFT。通过额外的技巧,他们可以找到基音和和声,并根据他们发现的内容推断出您演奏的音符。这必须很快发生,因为你可以在现场演奏时找到音符并触发特殊技巧。例如,您可以在吉他上敲击 A 音符,DSP 会理解它是 A 并将其替换为钢琴上的 A,因此您可以从扬声器中获得钢琴的声音。

使用该程序,用户能够
移动它们甚至编辑
他们。现在,显然这是很多
比较复杂,但是涉及到
是一样的吗?信号处理?我
我也对可能感兴趣
在音乐可视化中的应用和
智能照明系统。

是的。一旦进入频域,事情就变得非常容易。例如,您可以根据语音频率点亮特定的灯,并根据低音鼓点亮另一盏灯。

我的理解是这样做
对压缩音频进行处理
MP3 等格式不会产生相同的结果
结果为 MIDI,其中包含
单独的曲目(也许我
误解)。

它们是两个不同的东西。 MP3 是声波的压缩格式。基本上它采用驱动扬声器的东西并将其压缩。想法是相同的:DFT,然后删除不太可能被听到的内容(例如,高强度声音之后出现的高音不太可能被听到,因此它被删除)。

另一方面,MIDI 是事件的卷轴(你知道,就像遥远西部的那些钢琴,带有卷纸卷轴)。该文件不包含音乐。相反,它包含 MIDI 播放器在特定时间使用特定乐器演奏特定音符的指示。 “乐器库”的质量(除其他外)是区分坏的 MIDI 播放器(听起来像儿童玩具)和好的 MIDI 播放器(听起来很逼真,特别是对于钢琴和小提琴,对于管乐器我仍然一定要听听现实的)。

从 MIDI 到 MP3,您只需通过 MIDI 播放器进行演奏即可。反之则完全是另一回事,而且要复杂得多,正如您所说,这就是 DSP 发挥作用的地方。

这就像煮鱼缸一样。你会得到一份鱼汤。但要从鱼汤回到鱼缸,就困难多了。

未压缩的
PCM 等格式比 MP3 更好吗?

PCM 是一种将模拟信号转换为数字信号的技术。所以你的问题有一个根本性的误解,即不存在 PCM 格式(RAW 格式是一次千钧一发,基本上只包含原始数据)。如果您问未压缩的 WAV(包含 PCM 数据)是否比 MP3 更好,那么是的,但有时问题是这对人耳来说到底有多重要,以及您必须对该数据执行多少后处理。

知道是否有任何现有的
可以促进这一点的图书馆,
或与该主题相关的文章
面向计算机的
科学/编程,也许
示例代码。甚至开源
声音/音乐可视化工具或任何其他
开源声音处理代码
那就太好了。

如果您喜欢 Python,看看此页面

抱歉,如果我没有任何意义。就像我说的,我不知道我在说什么。

我也不知道,但我玩了一下它。

The thing that interests me the most
about signal processing is the
potential application in music. I
remember a while ago I saw a preview
of an application (Sorry, forgot the
name)

Maybe cubase ?

which could listen to a recording of
someone playing a guitar, and
automatically graph it out across a
time-line with the actual notes/chords
that were played

Deeply simplified, when you play a note you produce a periodic wave with a given frequency. There's a mathematical trick (the Fourier transform DFT) that converts the wave into the spectrum, which instead of presenting intensity against time, it shows it against frequency of the wave. For example, a perfect A note from a tuning fork would produce an oscillating wave at 440 Hz. In the time domain this would appear as a sinusoidal wave. In the frequency domain, it will appear as a single, narrow spike centered at 440 Hz.

Now, when you play a guitar you don't produce perfect sinusoidal waves. Hitting an A will produce the fundamental frequency, 440 Hz, but also a lot of additional frequencies (e.g. 880, on octave higher, but also a lot of other higher and lower freqs), due to the physics of the vibrating string, the material and shape of the guitar etc.. These additional frequencies are called harmonics, and they mix with the fundamental to produce "the sound of the guitar" (what in musical jargon is called timbre). A different instrument (say piano) will have different mixing of harmonics with the fundamental, producing a different timbre.

What DSP programs do is to perform a DFT on the entering signal. With additional tricks, they find the fundamental and the harmonics, and according to what they find they infer the note you played. This must happen fast, because you could find the note while playing live and triggering special tricks. For example, you could hit an A note on the guitar, the DSP understands it's an A and replaces it with the A from a piano, so from the speakers you obtain the sound of a piano.

Using the program, the user was able
to move these around and even edit
them. Now, obviously this is a lot
more complicated, but does it involve
the same thing? Signal Processing? I
am also interested in possible
applications in music visualizers and
intelligent lighting systems.

Yes. Once you are in the frequency domain, things gets very easy. For example, you could light up a specific light according to the voice frequencies, and another light with the bass drum.

My understanding is that doing this
processing on a compressed audio
format such as MP3 wont yield the same
results as MIDI which contains
separate tracks (Maybe I
misunderstood).

They are two different things. MP3 is a compressed format from a sound wave. Basically it takes what pilots the speakers, and compresses it. The idea is the same: DFT, then removal of stuff that is unlikely to be heard (for example, a high pitch that comes right after a high intensity sound is less likely to be heard, so it gets removed).

MIDI on the other hand is a scroll of events (you know, like those pianos in the far west, with the rolling paper scroll). The file contains no music. It contains instead directions for a MIDI player to perform specific notes at specific times with specific instruments. The quality of the "instrument bank" is (among other things) what distinguish a bad MIDI player (which sounds like a child toy) from a good MIDI player (which sounds realistic, in particular for pianos and violins, for wind instruments I still have to hear a realistic one).

It takes that going from MIDI to MP3, you just perform through a MIDI player. To do the other way around is a different story altogether, and much more complex, and here is where DSP comes into play, as you said.

It's like boiling a fisk tank. You get a fish soup. But to get from the fish soup back to the fish tank, it's much harder.

Would an uncompressed
format such as PCM do better than MP3?

PCM is a technique to convert an analog signal to a digital signal. So your question has a fundamental misunderstanding, that no PCM format exists (the RAW format is a close call, contaning basically nothing but crude data). If you ask if a uncompressed WAV (which contains PCM data) is better than MP3, then yes, but the question sometimes is how much this better really matters to the human ear, and how much postprocessing you have to perform on that data.

know if there are any existing
libraries which can facilitate this,
or articles pertaining to this subject
that geared towards Computer
Science/Programming, with perhaps
example code. Even open source
sound/music visualizers or any other
open source sound processing code
would be great.

If you like python, take a look at this page

Sorry if I didn't make any sense. Like I said, I don't know what I'm talking about.

Neither do I, but I toyed a bit with it.

二货你真萌 2024-08-16 20:50:43

<块引用>

我的理解是,对 MP3 等压缩音频格式进行此处理不会产生与包含单独轨道的 MIDI 相同的结果(也许我误解了)。

MIDI 本质上存储乐器信息和音符。还有其他效果(音量、弯音、颤音、起音速率等),

并不是真正的数字信号处理。

<块引用>

诸如 PCM 之类的未压缩格式会比 MP3 更好吗?

也许有一点;这取决于应用程序。 MP3 降低了人类不敏感的频率精度。如果您想做可视化,那么 MP3 可能就可以了。

但如果你想确定录音中演奏的是哪种乐器,那么人类不敏感的频率中可能隐藏着有用的信息。

我认为《科学家和工程师数字信号处理指南》很棒供程序员参考。第 8 章解释了离散傅立叶变换(在 MP3 处理和许多其他地方用于分离出波的分量频率)。

我用它来帮助制作一个图形程序,让您用鼠标绘制波形,然后应用 DFT,并让您选择要包含的频率。这是一次很棒的练习。

My understanding is that doing this processing on a compressed audio format such as MP3 wont yield the same results as MIDI which contains separate tracks (Maybe I misunderstood).

MIDI essentially stores instrument information and musical notes. Also other effects (volume, pitch bend, vibrato, attack rate, etc.)

Not really digital signal processing.

Would an uncompressed format such as PCM do better than MP3?

Maybe somewhat; it depends on the application. MP3 reduces the precision of frequencies that humans are not sensitive to. If you want to do visualisations then MP3 is probably fine.

But if you want to, say, determine what sort of instrument is playing in a recording, then there could be useful information hidden in the frequencies that humans are not sensitive to.

I think The Scientist and Engineer's Guide to Digital Signal Processing is a great reference for programmers. Chapter 8 explains the discrete Fourier transform (used in MP3 processing and a lot of other places to separate out the component frequencies of a wave).

I used it to help make a graphical program that let you draw a wave with the mouse, then applied the DFT, and let you select how many frequencies to include. It was a great exercise.

感性不性感 2024-08-16 20:50:43

我记得不久前我看到了一个应用程序的预览(抱歉,忘了名字),它可以听某人弹吉他的录音,并自动将其与实际音符/和弦在时间线上绘制出来。播放过的内容。

您可能还会想到 Melodyne:http://www.celemony.com/cms/

虽然 Vari新版本 Cubase 中的音频非常相似。 :)

I remember a while ago I saw a preview of an application (Sorry, forgot the name) which could listen to a recording of someone playing a guitar, and automatically graph it out across a time-line with the actual notes/chords that were played.

You might also be thinking of Melodyne: http://www.celemony.com/cms/

Though Vari audio in newer version of Cubase is pretty similar. :)

扭转时空 2024-08-16 20:50:43

我认为你需要准确定义你正在寻找什么以及你正在尝试做什么。

如果您想了解DSPMIDIPCM 然后维基百科上有大量信息和参考文献。

有许多可用的音频操作应用程序。您在问题中所描述的是每个数字录音室(如今几乎所有录音室)每天都发生的事情。

如果您打算对吉他声音等进行一些 DSP 处理,那么您最好录制吉他本身的声音(而不是包含鼓或人声的混合音轨)。很明显,与分析包含大量“噪声”的信号相比,分析没有额外噪声的离散信号会获得更好的结果。所以,是的,多轨录音比“MP3”更可取。

典型的 MP3 包含左声道和右声道(音轨),因此从技术上讲它是多音轨。当音乐被录制时(至少是专业的),不同的信号被录制到不同的轨道上,以便稍后可以对其进行离散编辑和处理。

那么,您想对这些声音做什么呢?

正如其他答案所指出的,这与 MIDI 根本无关。

I think you need to define exactly what you are looking for and what you are trying to do.

If you want to learn about DSP, MIDI or PCM then there is plenty of information on Wikipedia and references.

There are many a myriad of applications for audio manipulation available. What you've described in your question is what takes place in every digital recording studio (which these days would account for almost all studios) every single day.

If you are intending to perform some DSP against, say, a guitar sound then you would ideally have a recording of the guitar itself (rather than a mixed down track containing drums or vocals). It should be quite obviously that you will get better results analysing a discrete signal without additional noise than you will analysing a signal containing significant levels of 'noise'. So yes, a multitrack recording would be preferable to 'an MP3'.

Typical MP3 contains left and right channels (tracks) so it technically is multitrack. When music is recorded (professionally, at least) different signals are recorded onto different tracks, precisely so that they can be edited and processed discretely at a later time.

What, then, do you want to do with the sounds?

As other answers have pointed out, this does not relate to MIDI at all.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文