基于 TCP/IP 的语音通信

发布于 2024-08-15 21:23:25 字数 324 浏览 4 评论 0原文

我目前正在开发使用 DirectSound 进行 Intranet 通信的应用程序。我已经有了使用 UDP 的工作解决方案,但后来我的老板告诉我他出于某种原因想使用 TCP/IP。我尝试以与 UDP 几乎相同的方式实现它,但收效甚微。我得到的基本上只是噪音。其中20%是录制的声音,其余的只是奇怪的噪音。

我猜测原因是 TCP 需要多次读取所有接受的数据,直到获得我可以播放的最终声音。

现在有两个问题:

  • 我走在正确的轨道上吗?对于此类应用程序(某种语音会议)使用 TCP/IP 是个好主意吗?
  • 我用 C# 来做,但我不认为这是特定于语言的。

I'm currently developing application using DirectSound for communication on an intranet. I've had working solution using UDP but then my boss told me he wants to use TCP/IP for some reason. I've tried to implement it in pretty much the same way as UDP, but with very little success. What I get is basically just noise. 20% of it is the recorded sound and the rest is just weird noise.

My guess for the reason is that TCP needs to read all the accepted data several times until it gets the final sound I can play.

Now two questions:

  • Am I on the right tracks? Is it even good idea to use TCP/IP for this kind of application (voice conferencing of sorts)?
  • I'm doing it in C# but I don't think this is language specific.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

满天都是小星星 2024-08-22 21:23:25

不,使用 TCP 是一个糟糕的想法。在这种情况下,UDP 的性能会好得多,并且丢弃/不同步数据包并不重要!

如果你的老板无法理解技术细节,告​​诉他或她目前几乎所有的 VOIP 系统都使用 UDP,并且一定有一个原因:Skype、ventrilo、teamspeak、魔兽世界等

No, using TCP is a terrible idea. UDP in this case will perform much better and dropped / out of sync packets won't matter!

If your boss can't understand the technical details, tell him or her that virtually all VOIP systems currently existing use UDP and there must be a reason: Skype, ventrilo, teamspeak, World of Warcraft's, etc

后eg是否自 2024-08-22 21:23:25

为了正确回答这个问题,我觉得需要解释一下 VoIP 的一些关键概念。

首先,UDP 是 VoIP 最流行广泛使用的方法。请记住,IP 网络是分组交换的,非常适合非实时数据通信,而不是为实时 VoIP 设计的。

为了克服这个问题,使用了 UDP。 UDP 是不可靠且无连接的协议。虽然UDP会丢包,但语音音频仍然可以被理解,大脑会有效地补偿错误。这就是为什么您仍然可以通过信号只有 3 格的电话与某人通话。

数据包丢失和突发长度

数据包丢失通常是由于拥塞而发生的,因此数据包丢失的数量将取决于网络的装备状况。使用 UDP 的 VoIP 中的数据包丢失最常发生在突发长度中。突发长度是传输中连续丢失的数据包数量,因此突发长度为 3 意味着连续丢了3个包。

丢包补偿

在发生丢包的情况下,简单的丢包补偿技术就足够了,并且不会严重影响服务质量,即使在丢包20-30%的情况下,语音仍然可以被理解。方法包括:

  1. 重复上次成功
    收到数据包。

  2. 填充 - 在间隙中播放静音。

  3. 拼接 - 实际上这可以是
    考虑去除
    由突发长度引起的间隙
    通过推动开始和结束
    间隙在一起。

  4. 插值 - 使用以下知识
    之前和之后的讲话
    内插丢失的数据包
    间隙,例如成功接收的数据包之间的平均值
    突发长度之前和之后。

减少突发长度大小的一种好方法称为交织,因此提高 QoS 就是交织。块交织功能获取语音并将其分成一组数据包。这些数据包被加载到矩阵形状的缓冲区中(例如,4 x 4),使用函数旋转或转置缓冲区,以便数据包不按顺序排列。在接收方,该函数的逆函数用于对数据包进行重新排序。这种方法简单有效,见下图:

alt text http://img688.imageshack .us/img688/3962/capturevnk.png

我最近创建了一个小型 VoIP 应用程序。使用 UDP 通过无线 LAN。我不太确定您的应用程序的确切要求,但通常 VoIP 应用程序(两个主机之间)可以按如下方式实现:

替代文本 http://img338.imageshack.us/img338/6566/captureec.png

在图中,应用程序定义了自己的数据包设计。标头可以只是数据包编号(使用 1 个字节),有效负载可以是音频数据(n 个字节,有效负载大小)。定义它可以实现更好的数据包补偿技术,并允许编程的逻辑流程。

对于 VoIP,TCP 是一个糟糕的选择,原因有几个。快速谷歌搜索一下“TCP VoIP”,就会发现为什么第一个结果支持这个观点

TCP 是一种可靠的、面向连接的协议,这意味着传输中丢失的数据包将在某个时刻从其他主机重新发送。这种重传对于实时服务来说是不切实际的,并且会增加抖动、延迟,并可能增加数据包丢失(在某些情况下)。

您的问题的答案

我得到的基本上只是噪音。其中20%是录制的声音,其余的只是奇怪的噪音。

TCP不应该引入噪音,它应该引入抖动和延迟。
套接字往往有一个自动定义的超时时间,你定义了超时时间吗?如果不是,会发生什么情况,为什么您在播放前没有及时收到正确的数据包?

我走的路正确吗?对于这种应用程序(某种语音会议)使用 TCP/IP 是个好主意吗?

不,使用 TCP/IP,这不是一个好主意。您的经理似乎错误地认为任何数据包丢失都是一件可怕的事情。

摘要

此处显示了一些一般性关键概念,以尽可能帮助解决此特定问题,但这不应被视为详尽无遗。确保 VoIP 系统还使用语音编码/信号处理技术的一些基本原理。

要记住的要点是:

  • 对 VoIP 使用 UDP。

  • 实施丢包补偿
    技术。

  • 块交织器是一个简单的
    提高 QoS 的有效方法。

我希望这有帮助。

To answer this question correctly I feel that some of the key concepts of VoIP need to be explained.

Firstly, UDP is the most popular and widely used method for VoIP. Remember that an IP network is packet switched and ideal for non-real-time data communication and not designed for real-time VoIP.

To overcome this problem UDP is used. UDP is unreliable and connectionless protocol. Although UDP will lose packets the speech audio can still be understood, the brain will effectively compensate for the errors. Thats why you can still speak to someone on a phone with a 3 bars of signal.

Packet Loss and Burst Lengths

Packet loss often occurs due to congestion, so the amount of packet loss will depend on how well equiped the network is. Packet loss in VoIP using UDP will most often occur in burst lengths. A burst length is a the number of packets lost in succession in transmission, so a burst length of 3 means 3 packets in a row were lost.

Packet Loss Compensation

Where packet loss occurs simple packet loss compensation techniques will surfice and the Quality of Service will not be seriously effected, speech can still be understood even in cases where 20-30% of packets are lost. Methods include:

  1. Repeat the last successfully
    received packet.

  2. Fill in - Play silence in the gap.

  3. Splicing - Effectively this can be
    thought of taking removing
    the gap caused by the burst length
    by pushing the start and end of the
    gap together.

  4. Interpolation - Use knowledge of
    speech before and after to
    interpolate lost packets within the
    gap e.g. mean between the packets successfully recieved
    before and after the burst length.

A good method of reducing size of burst lengths is known as interleaving and thus increasing QoS is interleaving. A block interleave function takes the speech and splits it into a set of packets. These packets are loaded into a buffer the shape of a matrix (e.g. 4 by 4), a function is used rotate or transpose the buffer so the packets are not in order. On the reciever side the inverse of this function is used to re-order the packets. This method is simple and effective, See the figure below:

alt text http://img688.imageshack.us/img688/3962/capturevnk.png

I recently created a small VoIP app. over a wireless LAN using UDP. I am not really sure of the exact requirements of your application but generally VoIP applications (between two hosts) can implemented as follows:

alt text http://img338.imageshack.us/img338/6566/captureec.png

In the diagram the application defines it's own packet design. The header could just be the packet number (using 1 byte) and the payload the audio data (n bytes, size of payload). Defining this allows better packet compensation techniques and allows for a logical flow for programming.

TCP is a bad choice for VoIP for several reasons. A quick google of 'TCP VoIP' reveals why the first result backing this view.

TCP is a reliable, connection-orrientated protocol, this means that packets which are lost in transmission will at some point be resent from the other host. This retransmission is impractical for real-time services and will increase jitter, latency and possibly increase packet loss (in some cases).

Answers to Your Questions

What I get is basically just noise. 20% of it is the recorded sound and the rest is just weird noise.

TCP should not introduce noise, it should introduce jitter and latency.
Sockets tend to have an automatically defined time-out time, do you define the time-out time? If not what happens why you do not recieve the correct packet in time before playback?

Am I on the right tracks? Is it even good idea to use TCP/IP for this kind of application (voice conferencing of sorts)?

No do NOT use TCP/IP it is not a good idea. It appears that your manager has incorrectly assumed that any packet loss is a terrible thing.

Summary

Some general key concepts have been shown here to try and help as much as possible for this specific problem, however this should not be considered exhaustive. Make sure the VoIP system also uses some underlying principles of speech coding/signal processing techniques.

The key points to remember are:

  • Use UDP for VoIP.

  • Implement packet loss compensation
    techniques.

  • A block interleaver is a simple and
    effective method to increase QoS.

I hope this helps.

年华零落成诗 2024-08-22 21:23:25

当人们谈论 TCP/IP 堆栈时,他们通常指的是“整个 Internet 协议堆栈”,其中包括 UDP。也许这会让你的经理高兴;-)

When people are talking about the TCP/IP stack they often mean "the whole Internet protocol stack" which includes UDP. Maybe that makes your manager happy ;-)

柠檬 2024-08-22 21:23:25

TCP/IP 可以工作;它将传送数据。如果您不担心数据包丢失,它可能不如 UDP 高效,但您应该能够很好地传输数据。

TCP/IP would work; it will deliver the data. It might not be quite as efficient as UDP if you were not worrying about packet loss, but you should be able to transmit the data just fine.

风尘浪孓 2024-08-22 21:23:25

现代路由器和网络上的 TCP/IP 速度非常快。它不仅能够处理 IP 语音通信。 (我自己已经完成了)

我的猜测是您的实现存在一些与缓冲区大小相关的错误。

TCP/IP over modern routers and networks is very fast. It is more than capable of handling voice over IP communication. (I've done it myself)

My guess is that your implementation has some bugs in it related to buffer sizes.

趁年轻赶紧闹 2024-08-22 21:23:25

您没有理由通过 TCP 收到噪音,因此它看起来像是代码中的错误。事实上,我们接收的大多数流媒体(例如 YouTube)都是通过 TCP 完成的。

TCP 的问题是抖动。数据流的传送将被延迟,直到所有数据包均已收到并重新排序。现在,由于多媒体的延迟交付与根本没有交付一样好。这通常是比简单地对丢失的帧进行插值更糟糕的选择。如上所述,如果数据包丢失最少并且网络速度很快,那么应该没有什么区别。

基于 UDP 的 RTP/RTCP 通常用于媒体流的传送。 RTP 在数据包标头中包含诸如序列号之类的内容,允许在可能的情况下将迟到的数据包插入到正确的位置。 RTCP 具有报告功能,允许编解码器适应数据包丢失开始变得更高的情况。因此,RTP/RTCP 提供了一些但不是全部的 TCP 功能。

对于 TCP 上的流媒体,可以通过使用较大的抖动缓冲区轻松解决这个问题。这会增加延迟,但对于单向流媒体来说这不是问题。然而,延迟是双向会话流媒体中的一个主要问题。

不过,TCP 的一个主要优点是它比 UDP 更容易穿过防火墙。 TCP 会话建立后,防火墙将开放以发送和接收数据。对于 UDP 来说,这更加复杂,尤其是当人们期望传入数据流时。有多种方法可以解决这个问题,但它们可能很复杂,并且可能需要理解会话控制协议(如 SIP 或 RTSP)。

There is no reason why you should be getting noise over TCP and it therefore looks like a bug in your code. In fact most streaming media we receive (think YouTube) are done over TCP.

The problem with TCP is jitter. Delivery of your data stream will be delayed until all of the packets have been received and reordered. Now since late delivery for multimedia is as good as no delivery at all. This is normally a poorer choice than simply interpolating the missing frame. As mentioned above, if packet loss is minimal and your network fast, it should make no difference.

RTP/RTCP over UDP is normally used for delivery of the media stream. RTP includes things like sequence numbers in the packet header that allow for insertion of late packets into their correct position, where possible. RTCP has a reporting function that allows the codec to adapt to situaltions where packet loss starts to become higher. RTP/RTCP therefore provides some but not all TCP functionality.

For streaming media over TCP, this can be solved easily by having a large jitter buffer. This adds latency but for one-way streaming this is not a problem. Latency, however is a major problem in two-way-conversational streaming.

One main advantage to TCP, though, is that it traverses firewalls more easily than UDP. One a TCP session is established the firewall is open both to sent and receive data. This is more complicated for UDP especially when one is expecting an incoming stream of data. There are ways round this but they can be complicated and may involve understanding the session control protocol (like SIP or RTSP).

小清晰的声音 2024-08-22 21:23:25

我开发了一种语音操作 ip 解决方案,用于与 wave-api 进行双工通信,用于业余无线电收发器的远程控制。它与 UDP 以及 TCI/IP 配合得很好!我使用 512 字节缓冲区,每个 64 ms、8kHz 单波数据。上个月我通过 TCP/IP 在美国和欧洲之间进行了非常顺利的工作!现在我的问题是:wave-api 在 Win7 上无法正常工作,因此我认为 DirectSound 是更好的方法。就在此时,我在 Managed DirectX9 下的实现遇到了麻烦,我的应用程序是 VB.Net 2008。我搜索文档链接以获取 DirectSound - ManagedDirectX9 for VB.Net 的流输出。

I have developed a voice oper ip solution for a duplex comunication with wave-api for the remote control of a amateur radio tranceiver. It works verry well with UDP and also with TCI/IP! I use 512 byte buffer each 64 ms, 8kHz Mono wave data. I have work in the last month between usa and europa verry well over TCP/IP! Now my question: The wave-api do not work correct with Win7, therefore I think DirectSound its the better way. Just in tim I have trubble wit the implementation under Managed DirectX9, my application is VB.Net 2008. I search links to documentation for a streaming output with DirectSound - ManagedDirectX9 for VB.Net.

可可 2024-08-22 21:23:25

直播流数据使用 UDP 有几个主要原因。其中最大的问题是接收到延迟的数据就和根本没有接收到数据一样好,延迟流重传当然不是一个好主意。对于 VoIP,延迟容忍度约为 150 毫秒。任何延迟时间超过该时间的语音数据包都会引起用户的注意。

至于为什么会收到噪音,您如何处理由于重传而迟到的数据包?

There are a few main reasons why live streaming data uses UDP. The biggest of which is receiving late data is as good as not receiving it at all, and delaying the stream for retransmission is certainly not a good idea. For VoIP, you have a latency tolerance of somewhere around 150ms. Any voice packet that's delayed longer than that becomes noticeable for users.

As for why you are getting noise, how are you handling late arriving packets due to retransmits?

浮光之海 2024-08-22 21:23:25

取决于底层网络的类型,如果你有可靠性为 99.9% 的以太网,我猜 TCP 就可以了。但是,如果您通过 802.11 进行操作,那么 TCP 将不是一个好主意。

您可以向老板询问使用 TCP 的具体原因,然后实施该特定服务,例如基本可靠性或通过 UDP 的纠错服务。您可能还想研究 RTP。(http://en.wikipedia.org/wiki/实时传输协议

Depends on the kind of underlying network, if you have Ethernet with 99.9% reliability, my guess is TCP would do just fine. However if you are doing it over say 802.11 then TCP would be a not so good idea.

You can ask your boss for a specific reason to use TCP and then implement that particular service for example basic reliability, or an error correction service over UDP. You might also like to look into RTP.(http://en.wikipedia.org/wiki/Real-time_Transport_Protocol)

乱了心跳 2024-08-22 21:23:25

TCP 不应该引入任何噪音。抖动和滞后,是的(特别是当您的链接有损时);但一点噪音也没有。你的编程有些可疑。

顺便说一句,我同意在这种情况下 UDP 比 TCP 更合适。

TCP should not introduce any noise. Jitter and lag, yes (especially if your links are lossy); but no noise at all. Something is fishy with your programming.

BTW, I concur that UDP is far more appropriate than TCP in this case.

°如果伤别离去 2024-08-22 21:23:25

大多数语音应用程序都是使用 RTP 协议构建的,该协议是通过 UDP 端口传输的。大多数都具有编解码器支持,以确保媒体在从一端流到另一端之前被压缩。
与您的老板讨论带宽要求。

Most voice application are build using the RTP protocol which is stream over UDP port. Well most of them with codec support to ensure the media are compressed before stream from one end to another.
Discuss with your boss about the bandwidth requirements.

忆沫 2024-08-22 21:23:25

我很确定大多数流媒体音频/视频都使用 UDP...您可能会丢失一些数据包,但您永远不会注意到。

I'm pretty sure most streaming audio/video uses UDP...you might lose a few packets, but you would never notice.

傲鸠 2024-08-22 21:23:25

如果您收到噪音,则可能超出了已成功填充数据包的缓冲区部分,并播放空/未初始化的缓冲区。

If you're getting noise, you're probably overrunning the part of your buffer that has successfully filled with packets, and playing empty/uninitialized buffer.

兮颜 2024-08-22 21:23:25

TCP 比 UDP 慢多少?使用 TCP,如果任何数据包到达时乱序或损坏,就会出现重传延迟。我想说有一些方法可以优化 TCP,从而减少延迟。在 Linux 和 Winsock 中都有一个 TCP_NODELAY 选项可供使用。此外,紧凑的编解码器(如 G.729)将有助于降低有效负载大小。由于传输基于接收到的数据包(按顺序 - TCP),因此应重点优化数据包大小,使其足够小以减少重传延迟,但又足够大以维持高质量的流。一个好的 TCP VoIP 程序能够动态改变编码质量和数据包大小,发送者必须向接收者发出变化的信号。但事实上,使用 TCP 实现实时的唯一优势是它不太可能被防火墙阻止。

How much slower is TCP than UDP? With TCP you are getting a retransmission delay if any packets arrive out of order or corrupted. I will say there are ways to optimize TCP so there is less delay. In both Linux and Winsock there is a TCP_NODELAY option to use. Also a compact codec will help like G.729 to keep the payload size down. Since transmission is based on packets being received (in order - TCP) one should focus on optimizing the packet size to be small enough to reduce retransmission delay but large enough to maintain a quality stream. A good TCP voip program would have the ability to vary encoding quality and packet size on the fly where the sender would have to signal to the receiver of the change. But really the only advntage of using TCP for real-time is that it is less likely to be blocked by firewalls.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文