Java中BufferedReader.readLine()的最大行长度?
我使用 BufferedReader 的 readLine() 方法从套接字读取文本行。
没有明显的方法来限制读取的行的长度。
我担心数据源可以(恶意或错误)写入大量没有任何换行符的数据,这将导致 BufferedReader 分配无限量的内存。
有办法避免吗?或者我是否必须自己实现 readLine()
的有界版本?
I use BufferedReader's readLine()
method to read lines of text from a socket.
There is no obvious way to limit the length of the line read.
I am worried that the source of the data can (maliciously or by mistake) write a lot of data without any line feed character, and this will cause BufferedReader to allocate an unbounded amount of memory.
Is there a way to avoid that? Or do I have to implement a bounded version of readLine()
myself?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
最简单的方法是实现您自己的有界行读取器。
或者更简单的是,重用此
BoundedBufferedReader
类中的代码。实际上,编写一个与标准方法相同的
readLine()
并不是一件容易的事。正确处理这 3 种行终止符需要一些非常仔细的编码。将上述链接的不同方法与 Sun 版本 和 Apache Harmony 版本 的缓冲阅读器。注意:我并不完全相信有界版本或 Apache 版本是 100% 正确的。有界版本假设底层流支持标记和重置,这当然并不总是正确的。如果 Apache 版本将 CR 视为缓冲区中的最后一个字符,则它似乎会预读一个字符。当读取用户输入的输入时,这会在 MacOS 上中断。 Sun 版本通过设置一个标志来处理此问题,以导致在下一次读取...操作时跳过 CR 后可能的 LF;即没有虚假的预读。
The simplest way to do this will be to implement your own bounded line reader.
Or even simpler, reuse the code from this
BoundedBufferedReader
class.Actually, coding a
readLine()
that works the same as the standard method is not trivial. Dealing with the 3 kinds of line terminator CORRECTLY requires some pretty careful coding. It is interesting to compare the different approaches of the above link with the Sun version and Apache Harmony version of BufferedReader.Note: I'm not entirely convinced that either the bounded version or the Apache version is 100% correct. The bounded version assumes that the underlying stream supports mark and reset, which is certainly not always true. The Apache version appears to read-ahead one character if it sees a CR as the last character in the buffer. This would break on MacOS when reading input typed by the user. The Sun version handles this by setting a flag to cause the possible LF after the CR to be skipped on the next
read...
operation; i.e. no spurious read-ahead.另一个选择是 Apache Commons 的 BoundedInputStream:
Another option is Apache Commons' BoundedInputStream:
字符串的限制是 20 亿个字符。如果你想要限制更小,你需要自己读取数据。您可以一次从缓冲流中读取一个字符,直到达到限制或新行字符。
The limit for a String is 2 billion chars. If you want the limit to be smaller, you need to read the data yourself. You can read one char at a time from the buffered stream until the limit or a new line char is reached.
也许最简单的解决方案是采取稍微不同的方法。不要尝试通过限制一次特定读取来防止 DoS,而是限制原始数据读取的总量。通过这种方式,您无需担心每次读取和循环都使用特殊代码,只要分配的内存与传入数据成比例即可。
您可以测量
Reader
,或者可能更合适地测量未解码的Stream
或等效物。Perhaps the easiest solution is to take a slightly different approach. Instead of attempting to prevent a DoS by limiting one particular read, limit the entire amount of raw data read. In this way you don't need to worry about using special code for every single read and loop, so long as the memory allocated is proportionate to incoming data.
You can either meter the
Reader
, or probably more appropriately, the undecodedStream
or equivalent.有几种方法可以解决这个问题:
There are a few ways round this:
在 BufferedReader 中,不要使用 String readLine() ,而是使用 int read(char[] cbuf, int off, int len) ;然后,您可以使用 boolean ready() 来查看是否已获取全部内容,并使用构造函数 String(byte[] bytes, int offset, int length) 转换为字符串>。
如果您不关心空格,而只想每行有最大字符数,那么斯蒂芬建议的建议非常简单,
编辑:这是为了从用户终端读取行。它会阻塞直到下一行,并返回一个
bufferSize
-boundedString
;该行上的任何进一步输入都将被丢弃。In
BufferedReader
, instead of usingString readLine()
, useint read(char[] cbuf, int off, int len)
; you can then useboolean ready()
to see if you got it all and convert in into a string using the constructorString(byte[] bytes, int offset, int length)
.If you don't care about the whitespace and you just want to have a maximum number of characters per line, then the proposal Stephen suggested is really simple,
Edit: this is intended to read lines from a user terminal. It blocks until the next line, and returns a
bufferSize
-boundedString
; any further input on the line is discarded.