如何理解网络协议?
我从事网络开发工作,但对网络协议不太了解。我记得听过一个类比:TCP、HTTP 和 SSL 可以被视为围绕实际请求内容的一系列嵌套信封。
我也有一个模糊的想法 TCP 由数据包组成,这些数据包在另一端进行验证。但我有点想象 HTTP 请求也被切成数据包......
所以基本上,我根本不太理解这些东西。 有人可以对此做一个很好的概述吗?另外,您有推荐的适合初学者的书籍或其他资源吗?
I work in web development, but I don't have a great understanding of network protocols. I recall hearing an analogy that TCP, HTTP, and SSL can be thought of as a series of nested envelopes around the actual request content.
I also have a fuzzy idea TCP consists of packets, which are verified on the other end. But I'm sort of picturing the HTTP request being chopped into packets, too...
So basically, I don't understand this stuff well at all. Can anybody give a good overview of this? Also, is there a beginner-friendly book or other resource that you'd recommend?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
对于 TCP/IP 网络(没有物理层,例如以太网)的完整描述,请选择 Stevens 的TCP/IP Illustrator。如果你要做一些底层网络编程,同一作者的Unix网络编程是最好的。
For the throughout description of TCP/IP networking (without physical layer, e.g., Ethernet), pick TCP/IP Illustrated by Stevens. If you going to do some low-level network programming, Unix network programming by the same author is the best.
您经常听到称为“堆栈”的 TCP/IP 实现是有原因的。这个概念的一部分是,你有一个低层协议(以太网、PPP 等)、构建在其之上的稍微高层的协议(IP)等等。它与 OSI 模型非常相似,并且可以用该模型来描述,尽管 TCP/ IP 的分层方式略有不同。无论如何,程序通常使用上层协议之一发送数据,并让 TCP/IP 堆栈处理从 A 点到 B 点获取数据的细节。
TCP 位于 IP 之上,让您可以考虑数据流作为一对流(一进一出)输入和输出,而不是获取原始 IP 数据包并必须弄清楚如何处理它们。 (最大的好处:它简化了多路复用。如果没有 TCP 或 UDP 等,IP 将几乎毫无用处——在给定时间只有一个程序可以正常与网络通信。)
SSL 位于 TCP 之上,允许您发送 方法
HTTP 位于 TCP(或 SSL,在 HTTPS 的情况下)之上,并提供了一种 客户端和服务器传递整个请求和响应以及描述它们的元数据。
There's a reason you'll often hear of TCP/IP implementations called a "stack". Part of the concept is that you have a low-level protocol (Ethernet, PPP, what-have-you), slightly higher-level protocols built on top of it (IP), and so on. It's quite similar to the OSI model, and can be described in terms of that model, though TCP/IP breaks up the layers just a bit differently. Anyway, programs generally send data using one of the upper-level protocols, and let the TCP/IP stack handle the details of getting the data from point A to point B.
TCP sits on top of IP and lets you think of the data flowing in and out as a pair of streams (one in, one out) rather than getting raw IP packets and having to figure out what to do with them. (Big BIG benefit: it simplifies multiplexing. Without TCP or UDP or the like, IP would be near useless -- only one program could normally communicate with the network at a given time.)
SSL sits on top of TCP, and lets you send data over the stream that TCP provides without having to get involved in the ugly details of encrypting and decrypting data, verifying certificates, etc.
HTTP sits on top of TCP (or SSL, in the case of HTTPS), and provides a way for a client and server to pass entire requests and responses, along with metadata describing them.
网络协议是由规则、过程和格式组成的正式标准和策略,用于定义网络上两个或多个设备之间的通信。网络协议管理及时、安全和托管数据或网络通信的端到端流程。
网络协议有多种类型,包括:
• 网络通信协议:基本数据通信协议,例如 TCP/IP 和 HTTP。
• 网络安全协议:通过网络通信实现安全性,包括HTTPS、SSL 和SFTP。
• 网络管理协议:提供网络治理和维护,包括SNMP 和ICMP。
开放系统互连 (OSI) 参考模型的不同层为:
应用层:这是 OSI 参考模型中的最上层。应用程序层提供了应用程序进程访问网络服务的方式,因此与包括对应用程序的直接支持的服务相关联。
表示层:OSI 参考模型中的这一层负责指定用于使网络数据能够在网络中的计算机之间进行通信的格式。表示层向数据包添加格式化、加密和数据压缩。
会话层:该层使驻留在不同计算机上的应用程序能够创建和关闭网络会话。它还管理开放的网络连接或开放的会话。
传输层:传输层负责确保数据在网络上按顺序、无错误且高效地传送。传输层还识别重复的数据包并丢弃它们。传输层协议包括传输控制协议(TCP)和顺序数据包交换(SPX)。这些协议在接收计算机上打开数据包,并重新组装原始消息。
网络层:OSI 参考模型的这一层为所有网络提供消息寻址。它将逻辑地址和名称转换为物理地址,然后识别从源计算机到目标计算机的首选路由。
数据链路层:数据链路层通过定义软件驱动程序访问物理介质的方式来为物理连接准备数据。数据链路层将帧从网络层传输到物理层。
物理层:该层将数据放置在承载数据的物理介质上。它负责网络上交换数据的两台计算机之间的实际物理连接。
发送计算机上的协议功能总结如下:
• 将数据分段为更小、更易于管理的块或数据包。
• 将地址附加到数据包中。
• 确保数据已准备好通过网络接口卡 (NIC) 发送到网络电缆
接收计算机上的协议功能总结如下:
• 从网络电缆中删除数据包,并将数据包移至网络电缆。数据包通过网卡到达计算机。
• 删除与数据包发送相关的所有信息。这是发送计算机添加到数据包中的信息。
• 将数据包移至缓冲区以进行重组过程。
• 将数据传送到特定应用程序。
互联网协议:
互联网协议簇是一组通信协议,用于实现互联网运行的协议栈。互联网协议族有时也被称为TCP/IP协议族,简称TCP\IP,指的是其中的重要协议:传输控制协议(TCP)和互联网协议(IP)。互联网协议簇可以用OSI模型类比来描述,但也有一些区别。而且并非所有层都对应得很好。
协议栈:
协议栈是一组完整的协议层,它们协同工作以提供网络功能。
传输控制协议(TCP):
传输控制协议是互联网协议族的核心协议。它起源于网络实现,是对互联网协议的补充。因此,整个套件通常称为 TCP/IP。 TCP 通过 IP 网络提供八位位组流的可靠传送。排序和错误检查是 TCP 的主要特征。所有主要的互联网应用程序(例如万维网、电子邮件和文件传输)都依赖于 TCP。
互联网协议(IP):
互联网协议是互联网协议族中用于跨网络中继数据的主要协议。它的路由功能本质上建立了互联网。从历史上看,它是原始传输控制程序中的无连接数据报服务;另一个是面向连接的协议(TCP)。因此,Internet 协议族被称为 TCP/IP。
Network protocols are formal standards and policies comprised of rules, procedures and formats that define communication between two or more devices over a network. Network protocols govern the end-to-end processes of timely, secure and managed data or network communication.
There are several broad types of networking protocols, including:
• Network communication protocols: Basic data communication protocols, such as TCP/IP and HTTP.
• Network security protocols: Implement security over network communications and include HTTPS, SSL and SFTP.
• Network management protocols: Provide network governance and maintenance and include SNMP and ICMP.
The different layers of the Open Systems Interconnection (OSI) reference model are:
Application layer: This is the upper most layer in the OSI reference model. The application layer provides the means by which application processes can access network services, and is therefore associated with services that include direct support for applications.
Presentation layer: This layer in the OSI reference model deals with specifying the format which should be utilized to enable network data to be communicated between computers in the network. The presentation layer adds formatting, encryption, and data compression to the packet.
Session layer: This layer enables applications that reside on different computers to create and close network sessions. It also manages open network connections, or sessions that are open.
Transport layer: The transport layer is responsible for ensuring that data is delivered in sequence, error-free, and efficiently over the network. The transport layer also identifies duplicated packets, and drops them. Transport layer protocols include Transmission Control Protocol (TCP) and Sequenced Packet Exchange (SPX). These protocols open packets at the receiving computer, and reassemble the original messages as well.
Network layer: This layer of the OSI reference model provides addressing for messages for all networks. It translates your logical addresses and names to physical addresses, and then identifies the preferred route from the source computer to the destination computer.
Data Link layer: The Data Link layer prepares data for the physical connection by defining the means by which software drivers can access the physical medium. The Data Link layer transmits frames from the Network layer to the Physical layer.
Physical layer: This layer places the data on the physical medium which is carrying the data. It is responsible for the actual physical connection between two computers on the network that are exchanging data.
The function of protocols at the sending computer is summarized below:
• Segment data into smaller more manageable chunks or packets.
• Append addressing to the packets.
• Ensure that data is ready for sending via the network interface card (NIC) to the network cable
The function of protocols at the receiving computer is summarized below:
• Remove packets from the network cable, and move the packets through the NIC to the computer.
• Remove all information that relate to the sending of the packet. This is information added to the packet by the sending computer.
• Move the packets to the buffer for the reassembly process.
• Convey the data to the particular application.
Internet Protocol :
Internet protocol suite is the set of communication protocols that implement the protocol stack on which the internet runs. The Internet protocol suite is sometimes called the TCP/IP protocol suite, after TCP\IP, which refers to the important protocols in it, the Transmission Control Protocol(TCP) and the Internet Protocol(IP). The Internet protocol suite can be described by the analogy with the OSI model, but there are some differences. Also not all of the layers correspond well.
Protocol Stack:
A protocol stack is the complete set of protocol layers that work together to provide networking capabilities.
Transmission Control Protocol (TCP):
The Transmission Control Protocol is the core protocol of the internet protocol suite. It originated in the network implementation in which it complemented the Internet Protocol. Therefore the entire suite is commonly referred to as TCP/IP. TCP provides reliable delivery of a stream of octets over an IP network. Ordering and error-checking are main characteristics of the TCP. All major Internet applications such as World Wide Web, email and file transfer rely on TCP.
Internet Protocol(IP):
The Internet Protocol is the principal protocol in the Internet protocol suite for relaying data across networks. Its routing function essentially establishes the internet. Historically it was the connectionless datagram service in the original Transmission Control Program; the other being the connection oriented protocol(TCP). Therefore, the Internet protocol suite is referred as TCP/IP.
数据通信和网络 作者:Behrouz Forouzan :
这包含介绍性材料,并且解释对初学者友好。与此同时,它并没有变得简单化,而且随着你的继续,材料会变得更具挑战性。还有很好的图表解释概念。排版很棒,你会得到很多围绕内容的有趣提示。这些章节是根据 OSI 堆栈排序的,如此处其他答案中提到的。但协议效率公式的许多数学和推导都没有得到解释。
Andrew S. Tanenbaum 的计算机网络
Behrouz Forouzan 中找到的所有内容 + 许多方程。
我的建议是先阅读第一本书,如果您对数学特别好奇,请阅读第二本书。
Data Communications and Networking by Behrouz Forouzan:
This contains introductory material and the explanation is beginner friendly. At the same time, it is not dumbed down and the material gets a bit more challenging as you go on. There are very good diagrams explaining concepts too. The typesetting is awesome and you'll have lots of interesting tips surrounding the content. The chapters are ordered according to the OSI stack as mentioned in other answers here. But a lot of the math and derivations for formulas for protocol efficiencies aren't explained.
Computer Networks by Andrew S. Tanenbaum
Everything found in Behrouz Forouzan + lots of equations.
My recommendation is to read the first book first and if you are particularly curious about the math, go to the second one.
我们学校有计算机网络,我们必须购买这本书它真的很有帮助。它解释了 OSI 模型的每一层。 (从互联网电缆和路由器到 tcp udp 协议层再到应用层)。如果您想了解更多有关其工作原理的基本知识,那么这是必读的。
We had computer networking on school and we had to buy this book it really helps. It explains every layer of the OSI model. (From the internetcabel and routers up to the tcp udp protecol layers up to the application layer). If you want to have more basic knowledge of how it all works this is a must read.
自从我问这个问题以来,我对这个话题有了更多的了解,所以我会尝试自己回答这个问题。
描绘协议栈的最简单方法是将其视为一封包裹在一系列信封中的信件。每个信封在将信件送达收件人方面都有不同的作用,并且信封会在整个旅程中根据需要添加和删除。
应用层
这封信本身就是一个应用层的请求。例如,您在浏览器中输入“StackOverflow.com”并按 Enter 键。您的浏览器需要向 StackOverflow 服务器询问其主页。所以它写了一封信说:“亲爱的 StackOverflow,您能将您的主页发送给我吗?”
如果信件的作者是您的浏览器,则信件的收件人是在 StackOverflow 上运行的 Web 服务器程序。浏览器希望网络服务器以网页形式“写回”响应。浏览器和服务器都是应用程序 - 在特定计算机上运行的程序。
因为浏览器使用 HTTP,所以它用来发出请求:这封信的内容类似于“GET http://stackoverflow.com”。浏览器还会记录上次从 StackOverflow 获得的所有 cookie 信息(“还记得我吗?你告诉我我的登录 ID 是 X”),并添加一些称为“标头”的杂项标记信息(例如“我是 Firefox”和“我可以接受 HTML 或文本”和“如果您使用 gzip 压缩内容,我没问题”)。所有这些信息将帮助服务器了解如何个性化或定制其响应。
至此,浏览器就基本完成了。它将这封信交给操作系统并说:“请帮我发送这封信好吗?”操作系统说:“当然。”然后它会做一些工作来连接到 StackOverflow(稍后会详细介绍),然后告诉浏览器,“我正在处理它。顺便说一句,这是我为您制作的一个小邮件箱,称为套接字。当我收到 StackOverflow 的回复,我会将它的字母放在那里,你可以像阅读文件一样阅读它。”然后浏览器愉快地等待响应。
IP 层
要将请求从浏览器发送到 StackOverflow,操作系统必须执行几项操作。
首先,它必须查找 StackOverflow.com 的地址 - 具体来说,IP 地址。它使用 DNS 来完成此操作(我不会在这里详细介绍)。一旦知道了 IP 地址,它就会知道如何将请求包装在称为 IP 层的“信封”之一中。
为什么需要IP层?好吧,从前,我们没有。
为什么我们需要 IP
您是否看过一部老电影,其中有人通过要求接线员连接来拨打电话?操作员将物理连接从#1 的房子到#2 的房子的电线。在协议栈发明之前,连接计算机很像电话:您需要一条点对点的专用线路。
因此,举例来说,如果斯坦福大学的计算机科学家想要与哈佛大学的计算机科学家交换数据,他们会花一大笔钱在两地之间租用一条专用线路(“租用线路”)。进入一端的任何数据都会在另一端可靠地输出。然而,这非常昂贵:想象一下为您想要连接的每个地方支付一条单独的线路!
人们意识到这不会扩大规模。我们需要一种方法来建立一个由所有用户共享的网络,就像一张遍布地图的巨大电线蜘蛛网。这样,每个用户只需要一个网络连接,就可以通过它联系任何其他用户。
但这带来了一个问题。如果每个人的通信都在同一条线路上,那么数据如何到达正确的地方?想象一下一堆信件被扔在传送带上。显然,每封信都需要写给某人,否则就无法投递。
这就是IP的基本思想:每台机器都需要有一个唯一标识它的IP地址。消息被放置在 IP 数据包中,IP 数据包就像带有地址和返回地址的信封。
因此,一旦操作系统查找到 Stackoverflow.com 的 IP 地址,它就会将 HTTP 请求放入 IP 信封中。如果它是一封“长信”,对于一个信封来说太大,操作系统会将其切成碎片并将其放入多个 IP 信封中。每个信封都写着类似“FROM:(您的 IP 地址);TO:(服务器的 IP 地址。”与 HTTP 请求一样,IP 数据包还有一些其他杂项标头信息,我们不会在这里详细介绍,但基本信息想法只是“to”和“from”。
所以,此时,这封信已经准备好了,对吧?
IP 的混乱程度
不完全是,这封信很容易丢失!如果我们这样做,我们就能确保我们的信件能够送达:只要线路没有断线,一切都会顺利进行,
但是有了IP,每个人的包裹都会被扔到传送带上。这些传送带通向称为“路由器”的小型分拣站,如果您将路由器想象成物理邮件中心,那么您可以想象
“这是一封发往墨西哥城的信件。”我不知道具体怎么去,但是休斯顿的车站应该可以更近,所以我会把它寄到那里啊,这是一封寄往亚特兰大的信。我会把它寄给夏洛特; 东西
一般来说,这个系统工作正常,但它不如拥有自己的专用线路那么可靠。途中几乎任何事情都可能发生:传送带可能会破裂或着火,以及上面的所有 或者一个人可能会陷入困境一段时间,从而导致其数据包交付得很晚。
除此之外,因为这些传送带和站点是由每个人使用的,所以没有人会特别对待路由器。有一段时间,它可以将它们堆放在一个角落(也许在 RAM 中),但最终,它会耗尽空间
,然后它会开始扔掉它们
。就是这样,你可能会认为,至少给你发一封信,说:“抱歉,我们无法送达你的信。”但如果你想一想,情况并非如此。路由器不堪重负,可能是因为线路上的流量已经太多了,添加道歉信只会使问题变得更糟。所以它会扔掉你的数据包并且不会告诉任何人。
显然,这是我们的HTTP请求的问题。我们需要它才能到达那里,我们也需要响应才能可靠地返回。
为了确保货物到达目的地,我们需要某种“交货确认”服务。为此,我们将在放入 IP 数据包之前将另一个信封包裹在 HTTP 请求中。该层称为 TCP。
TCP
TCP 代表“传输控制协议”。它的存在是为了控制一个混乱且容易出错的交付过程。
正如前面所暗示的,TCP 让我们向这个混乱的传送系统添加一些“传送确认”。在将 HTTP 请求封装到 IP 数据包中之前,我们首先将其放入 TCP 数据包中。每个数据包都有一个编号:packet 1 of 5、packet 2 of 5 等。(编号方案实际上更复杂,它计算的是字节而不是数据包,但我们暂时忽略它。)
TCP 的基本思想是这样的:
(当东西到达另一端时获得确认比当路由器由于多种原因而沿途丢弃东西时获得错误报告更好。其中之一是确认通过工作连接返回,而错误会进一步阻塞非工作连接另一个是我们不必相信中间路由器会做正确的事情;客户端和服务器是最关心这个特定对话的人,所以他们负责确保它。有效。)
除了确保所有数据到达另一端之外,TCP 还确保接收到的数据在将其提交堆栈之前放回到正确的顺序,以防较早的数据包重新发送并稍后到达,或者数据包中间走了更长的路,或者其他什么。
基本上就是这样 - 有了这种传送确认,就可以使不可靠的 IP 网络变得可靠。
为什么不直接将其内置到 IP 中?
UDP
嗯,确认有一个缺点:它会让事情变得更慢。如果遗漏了什么,就必须重复。在某些情况下,这会浪费时间,因为您真正想要的是实时连接。例如,如果您正在通过 IP 进行电话交谈,或者您正在通过互联网玩实时游戏,您想知道现在发生了什么,即使这意味着您错过了一秒钟前发生的事情。如果你停止重复事情,你就会与其他人失去同步。在这种情况下,您可以使用 TCP 的近亲 UDP,它不会重新发送丢失的数据包。 UDP代表“用户数据报协议”,但很多人认为它是“不可靠的数据协议”。这不是侮辱,而是侮辱。有时,可靠性比保持最新状态更重要。
由于这两个都是有效的用例,因此 IP 协议在可靠性问题上保持中立是有道理的;使用它的人可以选择是否增加可靠性。
TCP 和 UDP 都向请求添加了另一项重要信息:端口号。
端口号
请记住,我们的原始请求来自浏览器,并将发送至 Web 服务器程序。但 IP 协议仅具有指定计算机的地址,而不具有指定计算机上运行的应用程序的地址。拥有 StackOverflow Web 服务器的机器还可能有其他正在侦听请求的服务器程序:数据库服务器、FTP 服务器等。当该机器收到请求时,它如何知道哪个程序应该处理它?
它会知道,因为 TCP 请求上有一个端口号。这只是一个数字,没什么花哨的,但按照惯例,某些数字被解释为表示某些事物。例如,使用端口号 80 是表达“这是对 Web 服务器的请求”的传统方式。然后,服务器计算机的操作系统将知道将该请求交给 Web 服务器程序,而不是 FTP 服务器程序。
当 TCP 数据包开始流回您的计算机时,它们还将有一个端口号,让您的计算机知道要向哪个程序提供响应。该数字将根据您的计算机最初创建的套接字而有所不同。
等等,什么是套接字?
套接字
还记得之前浏览器要求操作系统发送请求吗?操作系统表示,它将为收到的任何回复设置一个“邮件箱”。该容器称为套接字。
您可以将套接字想象成一个文件。文件是操作系统提供的接口。它说,“你可以在这里读取和写入数据,我将负责弄清楚如何将其实际存储在硬盘驱动器或 USB 闪存盘或其他设备上。”唯一标识一个文件的是路径和文件名的组合。换句话说,同一文件夹中只能有一个同名文件。
同样,套接字是操作系统提供的接口。它说:“您可以在这里编写请求并读取响应。”唯一标识套接字的东西是四项内容的组合:
因此,系统上只能有一个具有所有这些内容的相同组合的套接字。请注意,您可以轻松地将多个套接字打开到相同的目标 IP 和端口(例如 StackOverflow 的 Web 服务器),只要它们都有不同的源端口。操作系统将通过为每个请求选择任意源端口来保证它们这样做,这就是为什么您可以有多个选项卡或多个浏览器同时请求同一个网站,而不会出现任何混淆;返回的数据包都会说明它们要前往计算机上的哪个端口,这让操作系统知道“啊,这个数据包是用于 Firefox 中的选项卡 3”或其他什么。
到目前为止的总结
我们一直将协议视为包裹在信件周围的一系列信封。在我们的示例中,这封信是一个 HTTP 请求,先封装在 TCP 中,然后封装在 IP 中。 IP 数据包被发送到正确的目标计算机。该计算机拆下 IP“信封”并发现里面有一个 TCP 数据包。 TCP 数据包有一个端口号,它让操作系统知道在哪个端口收集其信息。它回复说它收到了该数据包,并将其内容(HTTP 请求)放入正确的套接字中,以便适当的程序进行处理。读自。当该程序向套接字写入响应时,操作系统将其发送回请求者。
所以我们的“堆栈”是:
重要的是要了解该堆栈是完全可定制的。所有这些“协议”都只是执行操作的标准方法。如果您认为接收计算机知道如何处理,则可以将任何您想要的内容放入 IP 数据包中;如果您认为接收应用程序知道如何处理,则可以将您想要的任何内容放入 TCP 或 UDP 数据包中。它。
您甚至可以在 HTTP 请求中添加其他内容。你可以说其中的一些 JSON 数据是“电话号码交换协议”,只要两端都知道如何处理它,就可以了,而且你刚刚添加了一个更高级别的协议。
当然,堆栈中的“高度”是有限制的 - 也就是说,您可以在 HTTP 中放入一个较小的信封,然后在其中放入一个较小的信封,依此类推,但最终您将没有任何空间可去。较小;您不会有任何实际内容。
但你可以轻松地进入堆栈的“较低”位置;您可以在现有的信封周围包裹更多的“信封”。
其他协议层
曾经常见的包裹 IP 的“信封”是以太网。例如,当您的计算机决定向 Google 发送 IP 数据包时,它会按照我们到目前为止所描述的方式将它们包装起来,但在发送它们时,它会将它们提供给您的网卡。然后,网卡可能会将 IP 数据包包装在以太网数据包(或令牌环数据包,如果您有古董设置)中,将它们寻址到您的路由器并将它们发送到那里。您的路由器删除这些以太网“信封”,检查 IP 地址,决定下一个最近的路由器是谁,包装另一个寻址到该路由器的以太网信封,然后发送数据包。
也可以包装其他协议。也许两个设备仅以无线方式连接,因此它们将以太网数据包封装在 Wi-Fi、蓝牙或 4G 协议中。也许你的包裹需要穿过一个没有电的村庄,所以有人将包裹打印在带有编号页的纸上,骑着自行车穿过城镇,然后按照页码的顺序将它们扫描到另一台计算机中。瞧!打印到 OCR 协议。或者,也许,我不知道,基于信鸽的 TCP 会更好。
结论
协议栈是一个美丽的发明,而且它运行得非常好,以至于我们通常认为它是理所当然的。
这是抽象功能的一个很好的例子:每一层都有自己的工作要做,并且可以依赖其他层来处理其余的工作。
(虽然这些层术语是从 OSI 借用的,但 OSI 实际上是 TCP/IP 的竞争标准,并包括 TCP/IP 不使用的“会话层”和“表示层”等内容,旨在成为杂乱无章的 TCP/IP 堆栈的更理智和标准化的替代方案,但尽管它仍然存在。正在讨论中,TCP/IP 已经开始工作并被广泛采用。)
因为可以根据需要混合和匹配各层,所以该堆栈足够灵活,可以适应我们能想到的几乎任何用途,因此它可能会存在很长一段时间。很久。希望现在你能更加欣赏它。
Since I asked this question, I've learned more about this topic, so I'll take a crack at answering it myself.
The easiest way to picture the protocol stack is as a letter, wrapped in a series of envelopes. Each envelope has a different role in getting the letter to its recipient, and envelopes are added and removed as needed along the journey.
The Application Layer
The letter itself is an application-layer request. For example, you've typed "StackOverflow.com" in your browser and pressed enter. Your browser needs to ask the StackOverflow server for its home page. So it writes a letter saying, "Dear StackOverflow, would you please send me your home page?"
If the writer of the letter is your browser, the recipient of the letter is the web server program running on StackOverflow. The browser wants the web server to "write back" with a response in the form of a web page. Both the browser and server are applications - programs running on specific computers.
Because browsers speak HTTP, that's what it uses to make the request: the letter says something like "GET http://stackoverflow.com". The browser also writes down any cookie information it got from StackOverflow last time ("remember me? You told me my login ID was X") and adds some miscellaneous labeled information called "headers" (things like "I'm Firefox" and "I can accept HTML or text" and "it's OK with me if you compress the content with gzip"). All that information will help the server know how to personalize or customize its response.
At that point, the browser is basically done. It hands this letter to the operating system and says, "would you please send this for me?" The OS says, "Sure." It then does some work to connect to StackOverflow (more on that in a minute), then tells the browser, "I'm working on it. By the way, here's a little mail bin I made for you, called a socket. When I hear back from StackOverflow, I'll put its letter in there and you can read it just like a file." The browser then happily awaits the response.
The IP layer
To send the request from the browser to StackOverflow, the operating system has to do several things.
First, it has to look up the address for StackOverflow.com - specifically, the IP address. It does this using DNS (which I won't go into here). Once it knows the IP address, it will know how to wrap the request in one of the "envelopes" called the IP layer.
Why do we need the IP layer? Well, once upon a time, we didn't.
Why we need IP
Have you ever seen an old movie where someone makes a phone call by asking the operator to connect them? The operator would physically connect the wire from Person #1's house to the wire for Person #2's house. Before the protocol stack was invented, connecting computers was a lot like that phone call: you needed a dedicated wire from point to point.
So, for example, if the computer scientists at Stanford wanted to exchange data with the ones at Harvard, they'd pay a bunch of money to rent a dedicated wire between the two places (a "leased line"). Any data that went into one end came out reliably at the other end. However, this was very expensive: imagine paying for a separate line for every place you want to connect to!
People realized that this wouldn't scale up. We needed a way to have a network that was shared by all users, like a giant spiderweb of wires spread out all over the map. That way, each user would only need one connection to the network and could reach any other user through it.
But that presented a problem. If everyone's communications went on the same lines, how would the data get to the right place? Imagine a bunch of letters dumped on a conveyor belt. Obviously, every letter needs to be addressed to someone, or else they can't be delivered.
That's the basic idea of IP: every machine needs to have an IP address that uniquely identifies it. Messages are placed in IP packets, which are like envelopes with addresses and return addresses.
So, once the OS has looked up the IP address for Stackoverflow.com, it puts the HTTP request in an IP envelope. If it's a "long letter", too big for one envelope, the OS cuts it into pieces and puts it in several IP envelopes. Each envelope says something like "FROM: (your IP address); TO: (The Server's IP address." Like the HTTP request, the IP packet has some other miscellaneous header information, which we won't go into here, but the basic idea is just "to" and "from."
So, at this point, the letter is ready to go, right?
The messiness of IP
Not quite. This letter could easily get lost! See, with IP, we no longer have a dedicated line from place to place. If we did, we'd be sure that our letters were getting delivered: as long as the line wasn't broken, everything would go through.
But with IP, everyone's packets get dumped onto conveyor belts and carried along. The belts lead to little sorting stations, called "routers". If you imagine the routers like physical mail centers, you could picture one in, say, New York City.
"Here's a letter headed for Mexico City. I don't know exactly how to get there, but the station in Houston should be able to get it closer, so I'll send it there. Ah, here's a letter that's going to Atlanta. I'll send it to Charlotte; they should be able to forward it a step closer."
Generally this system works OK, but it's not as reliable as having your own dedicated line. Nearly anything could happen en route: a conveyor belt could break or catch fire, and everything on it could be lost. Or one could get bogged down for a while, so that its packets are delivered very late.
Besides that, because these conveyor belts and stations are used by everyone, nobody's letters get treated specially. So what happens if a router gets more letters than it can possibly handle? For a while, it can stack them in a corner (maybe in RAM), but eventually, it runs out of space.
What it does then may seem shocking: it starts throwing them away.
Yep. That's it. You might think that it would at least be kind enough to send back a note to you, saying, "sorry, we couldn't deliver your letter." But it doesn't. If you think about it, if the router is overwhelmed, it's probably because there's too much traffic on the lines already. Adding apology notes would only make the problem worse. So it throws away your packet and doesn't bother telling anyone.
Obviously, this is a problem for our HTTP request. We need it to get there, and we need the response to get back reliably, too.
To make sure it gets there, we want some kind of "delivery confirmation" service. For that, we'll wrap another envelope around our HTTP request before putting into IP packets. That layer is called TCP.
TCP
TCP stands for "transfer control protocol." It exists to control what would otherwise be a messy, error-prone delivery process.
As implied before, TCP lets us add some "delivery confirmation" to this messy delivery system. Before we wrap our HTTP request in IP packets, we first put it into TCP packets. Each one gets a number: packet 1 of 5, 2 of 5, etc. (The numbering scheme is actually more complicated and counts bytes rather than packets, but let's ignore that for now.)
The basic idea of TCP is this:
(Getting confirmations when things arrive at the other end is better than getting error reports when a router drops things along the way for a couple of reasons. One is that confirmations go back over a working connection, whereas errors would further clog a non-working connection. Another is that we don't have to trust the intermediary routers to do the right thing; the client and server are the ones who care most about this particular conversation, so they're the ones who take charge of being sure that it works.)
Besides making sure that all the data gets to the other end, TCP also makes sure that the received data gets put back into the right order before handing it up the stack, in case earlier packets got resent and arrived later, or packets in the middle took a longer route, or whatever.
That's basically it - having this kind of delivery confirmation makes the unreliable IP network reliable.
Why wasn't it built straight into IP?
UDP
Well, confirmation has a drawback: it makes things slower. If something is missed, it must be repeated. In some cases, that would be a waste of time, because what you really want is a real-time connection. For example, if you're having a phone conversation over IP, or you're playing a real-time game over the internet, you want to know what's happening right now, even if it means you miss a bit of what happened a second ago. If you stop to repeat things, you'll fall out of sync with everyone else. In cases like that, you can use a cousin of TCP called UDP, which doesn't re-send lost packets. UDP stands for "user datagram protocol", but many people think of it as "unreliable data protocol". That's not an insult; sometimes reliability is less important than staying current.
Since both of these are valid use cases, it makes sense that the IP protocol stayed neutral on the issue of reliability; those who use it can choose whether to add reliability or not.
Both TCP and UDP add one other important piece of information to the request: a port number.
Port numbers
Remember, our original request is comes from a browser and is going to a web server program. But the IP protocol only has addresses that specify computers, not the applications running on them. The machine with StackOverflow's web server may also have other server programs that are listening for requests: a database server, an FTP server, etc. When that machine gets the request, how will it know which program should handle it?
It will know because the TCP request has a port number on it. This is just a number, nothing fancy, but by convention, certain numbers are interpreted to mean certain things. For example, using a port number of 80 is a conventional way of saying "this is a request for a web server." Then the server machine's operating system will know to hand that request to the web server program and not, say, the FTP server program.
When the TCP packets start streaming back to your computer, they will also have a port number, to let your machine know which program to give the response to. That number will vary based on the socket that your machine created initially.
Wait, what's a socket?
Sockets
Remember earlier when the browser asked the OS to send the request? The OS said it would set up a "mail bin" for any response it got back. That bin is called a socket.
You can think of a socket sort of like a file. A file is an interface that the OS provides. It says, "you can read and write data here, and I will take care of figuring out how to actually store it on the hard drive or USB key or whatever." The thing that uniquely identifies a file is the combination of path and filename. In other words, you can only have one file located in the same folder with the same name.
Similarly, a socket is an interface the OS provides. It says, "you can write requests here and read responses." The thing that uniquely identifies a socket is the combination of four things:
So, you can only have one socket on a system with the same combination of all of those. Notice that you could easily have several sockets open to the same destination IP and port - say, StackOverflow's web server - as long as they all have different source ports. The OS will guarantee that they do by choosing an arbitrary source port for each request, which is why you can have several tabs or several browsers all requesting the same web site simultaneously without anything getting confused; the packets coming back all say which port on your computer they're headed for, which lets the OS know "ah, this packet is for tab 3 in Firefox" or whatever.
Summing up so far
We've been thinking of the protocols as a series of envelops wrapped around the letter. In our example, the letter was an HTTP request, which got wrapped in TCP, then in IP. The IP packets get sent to the right destination computer. That computer removes the IP "envelope" and finds a TCP packet inside. The TCP packet has a port number, which lets the operating system know which port to collect its information in. It replies saying that it got that packet, and it puts its contents (the HTTP request) into the correct socket for the appropriate program to read from. When that program writes a reponse to the socket, the OS sends it back to the requester.
So our "stack" is:
It's important to understand that this stack is totally customizable. All of these "protocols" are just standard ways of doing things. You can put anything you want inside of an IP packet if you think the receiving computer will know what to do with it, and you can put anything you want inside a TCP or UDP packet if you think the receiving application will know what to do with it.
You could even put something else inside your HTTP request. You could say that some JSON data in there is the "phone number exchange protocol," and as long as both ends know what to do with it, that's fine, and you've just added a higher-level protocol.
Of course, there's a limit to how "high" you can go in the stack - that is, you can put a smaller envelope inside HTTP, and a smaller one inside that, etc, but eventually you won't have any room to go smaller; you won't have any bits for actual content.
But you can easily go "lower" in the stack; you can wrap more "envelopes" around the existing ones.
Other protocol layers
Once common "envelope" to wrap around IP is Ethernet. For example, when your computer decides to send IP packets to Google, it wraps them up as we've described so far, but to send them, it gives them to your network card. The network card may then wrap the IP packets in Ethernet packets (or token ring packets, if you've got an antique setup), addressing them to your router and sending them there. Your router removes those Ethernet "envelopes", checks the IP address, decides who the next closest router is, wraps another Ethernet envelope addressed to that router, and sends the packet along.
Other protocols could be wrapped as well. Maybe two devices are only connected wirelessly, so they wrap their Ethernet packets in a Wi-Fi or Bluetooth or 4G protocol. Maybe your packets need to cross a village with no electricity, so someone physically prints the packets on paper with numbered pages, rides them across town on a bicycle, and scans them into another computer in the order of the page numbers. Voila! A print-to-OCR protocol. Or maybe, I don't know, TCP over carrier pigeon would be better.
Conclusion
The protocol stack is a beautiful invention, and it works so well that we generally take it for granted.
It is a great example of abstracting functionality: each layer has its own work to do and can rely on others to deal with the rest.
(Although these layer terms are borrowed from OSI, OSI was actually a competing standard to TCP/IP, and included things like the "session layer" and "presentation layer" that TCP/IP doesn't use. OSI was intended to be a more sane and standardized alternative to the scrappy hacked-together TCP/IP stack, but while it was still being discussed, TCP/IP was already working and was widely adopted.)
Because the layers can be mixed and matched as needed, the stack is flexible enough to accommodate nearly any use we can think of, so it's probably going to be around for a long time. And hopefully now you can appreciate it a bit more.