通过 Java 套接字接收混合媒体。你的更好吗?
我即将进行 Java 编程练习,我希望我的学生自己发现 HTTP 的内在本质,而不是让 URLConnection 为他们完成所有工作。为了估计复杂性,我想出了以下代码片段,它解析回复(恕我直言,这是工作中最难的部分之一),它将返回例如“HTTP/1.1 200 OK” ,将诸如 "Server: makato" 和 "content-length: 1337" 之类的内容推送到 headers 向量中,并保留 InputStream< /em> 位于内容的第一个字节,以便稍后可以在其上安全地构建 DataInputStream 或 InputStreamReader。
我很想知道对 Java 类有更多经验的人是否可以提出更优雅的替代方案。我不满意的一件事是,每个单独的 is.read() 将不可避免地生成一个额外的系统调用(假设 Socket.getInputStream() 用于提供 is > 论证)。
public static String recvHttpHeaders(InputStream is, Vector<String> headers)
throws Exception {
byte line[] = new byte[512];
String pending=null;
String status=null;
boolean complete=false, CR=false;
int n=0;
while (!complete) {
int x = is.read();
switch(x) {
case -1: throw new Exception("something went wrong");
case '\r':
if (CR) throw new Exception("encoding mismatch CRCR");
CR=true;
break;
case '\n': // bare LF are accepted silently.
String ln = new String(line,0,n,"ASCII");
if (pending!=null) ln = pending + ln;
if (status==null) status = ln;
else headers.add(ln);
complete = ln.length()==0;
pending = null;
n=0; CR=false;
break;
default:
if (CR) throw new Exception("encoding mismatch ?CR");
if (n>=512) {
String part = new String(line, "ASCII");
if (pending!=null) pending += part;
else pending = part;
n=0;
}
line[n++]=(byte)x;
break;
}
}
return status;
}
编辑:不可否认,人们希望在这里使用xxx.readline()以避免搞乱线条重建。 BufferedReader(或者实际上任何其他 *Reader)根据一种字符集将字节转换为字符。这意味着如果我在标头解析中使用该功能,我将不再可以自由地为内容选择该字符集。我还没有发现任何具有内置 readline 功能的字节级类。
性能解决方案:感谢您指出BufferedInputStream。我做了一些额外的测试,事实上,调用 as
BufferedInputStream bis = new BufferedInputStream(socket.getInputStream());
String status = recvHttpHeaders(bis, headers);
rawCopy(bis, output);
确实减少了执行的系统调用量,并且仍然允许我正确接收未经修改的二进制内容。
I'm about to give a programming exercice in Java and I'd like my students to discover the intrinsics of HTTP themselves rather than having URLConnection doing all the job for them. In order to estimate the complexity, I came up with the following snippet, which parses the reply (imho, one of the hardest part of the job), which will return e.g. "HTTP/1.1 200 OK", push things like "Server: makato" and "content-length: 1337" in the headers vector and leave the InputStream at the first byte of the content, so that a DataInputStream or a InputStreamReader can later be built on top of it safely.
I'm curious to know if someone with more experience of the Java classes could suggest more elegant alternatives. One thing I'm not pleased with is that each individual is.read() will inevitably generate an additional system call (assuming that Socket.getInputStream() is used to feed is argument).
public static String recvHttpHeaders(InputStream is, Vector<String> headers)
throws Exception {
byte line[] = new byte[512];
String pending=null;
String status=null;
boolean complete=false, CR=false;
int n=0;
while (!complete) {
int x = is.read();
switch(x) {
case -1: throw new Exception("something went wrong");
case '\r':
if (CR) throw new Exception("encoding mismatch CRCR");
CR=true;
break;
case '\n': // bare LF are accepted silently.
String ln = new String(line,0,n,"ASCII");
if (pending!=null) ln = pending + ln;
if (status==null) status = ln;
else headers.add(ln);
complete = ln.length()==0;
pending = null;
n=0; CR=false;
break;
default:
if (CR) throw new Exception("encoding mismatch ?CR");
if (n>=512) {
String part = new String(line, "ASCII");
if (pending!=null) pending += part;
else pending = part;
n=0;
}
line[n++]=(byte)x;
break;
}
}
return status;
}
edit: admittedly, one would love to use xxx.readline() here to avoid messing up with lines reconstruction. BufferedReader (or any other *Reader, actually) converts bytes into chars according to one charset. That means I'm no longer free to chose that charset for the content if I used that feature in the header parsing. I haven't found any byte-level classes that has readline ability built-in.
performance solution: Thanks for pointing out BufferedInputStream. I made a few additional tests, and indeed, invoking as
BufferedInputStream bis = new BufferedInputStream(socket.getInputStream());
String status = recvHttpHeaders(bis, headers);
rawCopy(bis, output);
indeed reduce the amount of system calls performed and still allow me to properly receive binary content unmodified.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您应该使用 BufferedReader 来读取文本。包装您的输入流:
BufferedReder br = new BufferedReader(new InputStreamReader(is));
然后使用 readLine() 逐行读取内容:
You should rather use BufferedReader to read texts. Wrap your input stream:
BufferedReder br = new BufferedReader(new InputStreamReader(is));
Then use readLine() to read stuff line by line:
根据 Sripathi Krishnan 和 Adam Paynter 的评论,改进它的方法是使用 BufferedInputStream,这样性能仍然可以接受,并且不会发生字符集转换。
Following comments of Sripathi Krishnan and Adam Paynter, the way to improve it is to use a BufferedInputStream, so that performance remains acceptable and no charset transformation happens.