有关网络浏览器和文件下载/上传的问题吗?
我想知道网络浏览器是如何工作的。我刚刚复习完我的网络教科书。以下是我的疯狂想象和问题。
Web 浏览器通常使用 HTTP 协议。所以第一个问题是:
- 谁负责支持 HTTP 协议?
我认为HTTP协议应该在Web浏览器中实现,而TCP、UDP、IP等协议应该在操作系统中实现。这就是为什么 HTTP 协议被称为应用程序级协议。 (如果我错了,请纠正我。)
回到 HTTP 通信场景。
在客户端:
当在网络浏览器中输入地址时,网络浏览器将选择适当的 HTTP 方法并构造完整的 HTTP 请求。这个 HTTP 请求只不过是纯 ASCII 文本。然后浏览器选择一个私有端口号并使用TCP协议将文本比特流发送到服务器。在此过程中,如果 URL 不是 IP 地址,则会进行 DNS 查询。
在服务器端:
当某个 TCP 数据包到达服务器时,会检查它是否针对 80 端口。如果是,则将其传递给服务器进程,现在 TCP 协议已完成其工作,是时候让服务器程序开始行动了。服务器程序需要实现HTTP协议,以便能够解析从TCP数据包中提取的客户端浏览器的HTTP请求。然后将必要的HTML文件返回给客户端。这些 HTML 文件可以是静态的,也可以是使用 ASP.NET 等技术动态生成的。
在上传和下载文件的情况下,我认为可以有两种不同的方法。
无论文件是什么格式,我们都可以将它们编码为 Base64 字符串,这样我们就可以将它们嵌入到网页中。我想知道是否可以将 jpeg 文件编码为 Base64 字符串并将其嵌入网页中。(如果我错了,请纠正我。)
另一种方法是不嵌入文件内容在网页中,而是直接使用TCP连接来传输。这种方法不需要 Base64 编码,并且应该具有更好的性能。 (如果我错了,请纠正我。)
我可能为上述问题和陈述选择了一个不好的标题。
希望版主不要把这个问题当作题外话。
非常感谢。
I am wondering how a web browser work. I have just finished reviewing my network text book. Below is my wild imagination and questions.
Web browser usually works with HTTP protocol. So the 1st question will be:
- Who is responsible to support the HTTP protocol?
I think the HTTP protocol should be implemented in the web browser, while the TCP, UDP, IP, etc protocols should be implemented in OS. And this is why the HTTP protocol is called an Application-Level Protocol. (Correct me if I am wrong.)
Get back to the HTTP communication scenario.
On the client side:
When an address is typed into the web browser, the web browser will choose proper HTTP method and fabricate a complete HTTP request. This HTTP request is nothing but pure ASCII text. Then the browser choose a private port number and use the TCP protocol to send the text bit stream to the server. During this process, DNS query is made if the URL is not a IP address.
On the server side:
When some TCP packet arrives at the server, it is examined to see if it is targeted at 80 port. If so, it is delivered to the server process, and now the TCP protocol has finished its job and it's time for the server program to come into action. The server program needs to implement the HTTP protocol so that it could parse the client browser's HTTP request extracted from the TCP packet. And then return necessary HTML file back to client. These HTML file could be static or dynamically generated with technology like ASP.NET.
In the case of uploading and downloading file, I think there can be 2 different approaches.
No matter what format the file is, we could encode them as Base64 string, thus we could embed them as part of the web page. I am wondering if it is possible to encode a jpeg file as Base64 string and embed it in a web page.(Correct me if I am wrong.)
The other approach is not to embed the file content in the web page, but to directly use TCP connections to transmit it. This approach doesn't require Base64 encoding and should have better performance. (Correct me if I am wrong.)
I may have chosen a bad title for the above questions and statement.
I hope the moderator wouldn't take this question as off topic.
Many thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我看到一些我认为不是100%正确的事情......
在这里,您似乎在说浏览器选择本地端口并将其分配给正在打开的 TCP 流,这是不正确的,因为操作系统会跟踪已使用/未使用的端口,并在新的 TCP 流打开时将端口分配给流。正在建立连接。
这里我看到两个细节:首先,从文本来看,单个 TCP 数据包包含了所有 HTTP 消息,这可能不是真的(由于多种原因,它可能被分割成许多 TCP 数据包,但主要是因为数据包有固定的最大长度,并且消息可能比该长度长)。其次,假设 Web 服务器在端口 80 上运行,这通常是这种情况,但并不总是如此,例如我相信 Apache Tomcat Web 服务器的默认端口是 8080,并且在大多数情况下 Web 服务器的端口监听可以配置(我不知道有哪个网络服务器是 100% 固定到特殊端口的)。
最后,当你说
我认为在这种特殊情况下(用户输入地址并按 Enter 键),浏览器将始终选择 GET 方法,因为在向服务器提交信息时使用 POST 方法。
HTTP 起初非常简单,但有许多细节增加了它的复杂性。我远不是这个主题的专家,但也许这可以进一步帮助:) http:// www.faqs.org/rfcs/rfc2616.html
I saw some things I beleive are not 100% percent correct...
Here it seems you're saying that the browser chooses a local port and assigns it to the TCP stream being opened, which is not correct, since the OS is what keeps track of used/unused ports, and assigns ports to streams when a new connection is being established.
Here I see two details: First, from the text it seems a single TCP packet contains all the HTTP message, which may not be true (it may be split in many TCP packets for a number of reasons, but mainly because packets have a fixed maximum length, and the message may be longer than that length). Second, it is assumed that the web server runs on port 80, which is normally the case, but not always true, for example I believe the default port for the Apache Tomcat Web Server is 8080, and in most cases the port the web server listens to can be configured (I don't know any webserver that is 100% fixed to a special port).
Finally, when you say
I think that in this particular case (the user typed an address and pressed enter) the browser will always choose the GET method, as POST is used when submitting information to the server.
HTTP is quite simple at first, but has many details that add complexity to it. I am far from being an expert on the subject, but maybe this can help further :) http://www.faqs.org/rfcs/rfc2616.html
TCP 是一种传输协议。它描述了如何将任意数据流从一个网络点传输到另一个网络点。这是几乎每个网络应用程序都需要做的事情,因此将其内置到操作系统中是有意义的(请注意,这不是必需的,但现在每个流行操作系统都已标准这样做)。
然而,在两台机器/应用程序之间进行数据交换还不够,它们需要就数据的格式化方式(协议)达成一致。数据格式化的方式有很多种,最佳方式取决于数据的类型和应用程序的类型。
HTTP 专门设计为 TCP 之上的请求/响应协议,但它可以在任何传输协议上运行。您不需要对 HTTP 中的数据进行 Base64 处理,因为接收者不需要检查数据有效负载以查找其何时结束,有效负载可以包含任何内容。 HTTP 使用长度标头来告知接收者消息包含多少数据。
其他应用程序可能会选择其他协议来建立在 TCP 之上。例如,HTTP 没有任何功能允许服务器向客户端发送消息,除非响应请求。如果需要的话,另一个协议会更适合,这就是 websockets 试图实现的目标。
TCP is a transport protocol. It describes how to get a stream of arbitrary data from one network point to another. This is something almost every networking app will need to do, so it makes sense to have this built into the OS (it need not be, mind, but it is by standard on every popular OS now).
Having data exchange between two machines/applications isn't quite enough though - they need to agree on how the data will be formatted (a protocol). There are many different ways data can be formatted, and the best way depends on the kind of data and the kind of application.
HTTP is designed specifically as a request/response protocol on top of TCP, but it could run over any transport protocol. You don't need to base64 data in HTTP, since the recipient need not inspect the data payload to find when it ends, the payload can contain anything. HTTP uses a length header to inform the recipient how much data the message contains.
Other applications might choose other protocols to sit on top of TCP. For example HTTP has no facility to allow a server to send a message to the client, except in response to a request. Another protocol would be better suited if this is needed, this is kind of what websockets are trying to achieve.
仅回答您有关文件上传/下载的问题:
如果仅限于 HTTP,则文件下载通常由浏览器通过可下载文件的链接进行,然后针对该内容发送 GET 请求。数据通过 HTTP 连接发送,其方式与发送网页的方式相同。
对于文件上传,最常见的情况是表单提交;用户选择一个文件作为表单的一部分。单击提交按钮后,浏览器将向服务器发送 POST。作为 POST 的一部分,分配了一个分隔符字符串,并且文件的字节在分隔符字符串之间发送,以便服务器可以识别它。
还有其他选项,但这些是最常见的。
没有内置的 HTTP 方法来支持将文件加载为 base64 字符串。网页中包含的文件(例如图片)是在单独的请求中请求的:每个资源都通过 GET 请求检索。但是,可以使用 JavaScript 解析包含的 Base64 字符串中的数据并将其重新组装成资源。由于需要完成复杂的代码和处理,因此无法完成此操作,但这是可能的。
至于你对网络通信的分析,在我看来非常正确。
Solely in response to your questions regarding file uploads/downloads:
If restricted to HTTP, file downloads normally occur by a browser following a link to a downloadable file, upon which a GET request is sent for that content. The data is sent over the HTTP connection in the same way a web page is sent.
For a file upload, the most common case is that of a form submission; the user chooses a file as part of a form. Upon the submit button being clicked, the browser sends a POST to the server. As part of the POST a delimiter string is assigned, and the file's bytes are sent in between the delimiter string so the server can recognize it.
There are other options, but these are most common.
There is no built-in-to-HTTP way to support loading a file as a base64 string. Files included in web pages, like pictures, are requested in separate requests: each resource is retrieved with a GET request. However, it would be possible to parse data from an included base64 string using javascript and reassemble it into a resource. This wouldn't be done because of the complex code and processing that would need to be done, but is possible.
As for your analysis of the network communication, it seems pretty much correct to me.