Java中BufferedReader.readLine()的最大行长度?

发布于 2024-11-06 05:19:09 字数 204 浏览 5 评论 0原文

我使用 BufferedReader 的 readLine() 方法从套接字读取文本行。

没有明显的方法来限制读取的行的长度。

我担心数据源可以(恶意或错误)写入大量没有任何换行符的数据,这将导致 BufferedReader 分配无限量的内存。

有办法避免吗?或者我是否必须自己实现 readLine() 的有界版本?

I use BufferedReader's readLine() method to read lines of text from a socket.

There is no obvious way to limit the length of the line read.

I am worried that the source of the data can (maliciously or by mistake) write a lot of data without any line feed character, and this will cause BufferedReader to allocate an unbounded amount of memory.

Is there a way to avoid that? Or do I have to implement a bounded version of readLine() myself?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

我ぃ本無心為│何有愛 2024-11-13 05:19:09

最简单的方法是实现您自己的有界行读取器。

或者更简单的是,重用BoundedBufferedReader中的代码。

实际上,编写一个与标准方法相同的 readLine() 并不是一件容易的事。正确处理这 3 种行终止符需要一些非常仔细的编码。将上述链接的不同方法与 Sun 版本Apache Harmony 版本 的缓冲阅读器。

注意:我并不完全相信有界版本或 Apache 版本是 100% 正确的。有界版本假设底层流支持标记和重置,这当然并不总是正确的。如果 Apache 版本将 CR 视为缓冲区中的最后一个字符,则它似乎会预读一个字符。当读取用户输入的输入时,这会在 MacOS 上中断。 Sun 版本通过设置一个标志来处理此问题,以导致在下一次读取...操作时跳过 CR 后可能的 LF;即没有虚假的预读。

The simplest way to do this will be to implement your own bounded line reader.

Or even simpler, reuse the code from this BoundedBufferedReader class.

Actually, coding a readLine() that works the same as the standard method is not trivial. Dealing with the 3 kinds of line terminator CORRECTLY requires some pretty careful coding. It is interesting to compare the different approaches of the above link with the Sun version and Apache Harmony version of BufferedReader.

Note: I'm not entirely convinced that either the bounded version or the Apache version is 100% correct. The bounded version assumes that the underlying stream supports mark and reset, which is certainly not always true. The Apache version appears to read-ahead one character if it sees a CR as the last character in the buffer. This would break on MacOS when reading input typed by the user. The Sun version handles this by setting a flag to cause the possible LF after the CR to be skipped on the next read... operation; i.e. no spurious read-ahead.

薄荷→糖丶微凉 2024-11-13 05:19:09

另一个选择是 Apache Commons 的 BoundedInputStream

InputStream bounded = new BoundedInputStream(is, MAX_BYTE_COUNT);
BufferedReader reader = new BufferedReader(new InputStreamReader(bounded));
String line = reader.readLine();

Another option is Apache Commons' BoundedInputStream:

InputStream bounded = new BoundedInputStream(is, MAX_BYTE_COUNT);
BufferedReader reader = new BufferedReader(new InputStreamReader(bounded));
String line = reader.readLine();
天涯离梦残月幽梦 2024-11-13 05:19:09

字符串的限制是 20 亿个字符。如果你想要限制更小,你需要自己读取数据。您可以一次从缓冲流中读取一个字符,直到达到限制或新行字符。

The limit for a String is 2 billion chars. If you want the limit to be smaller, you need to read the data yourself. You can read one char at a time from the buffered stream until the limit or a new line char is reached.

倾听心声的旋律 2024-11-13 05:19:09

也许最简单的解决方案是采取稍微不同的方法。不要尝试通过限制一次特定读取来防止 DoS,而是限制原始数据读取的总量。通过这种方式,您无需担心每次读取和循环都使用特殊代码,只要分配的内存与传入数据成比例即可。

您可以测量Reader,或者可能更合适地测量未解码的Stream或等效物。

Perhaps the easiest solution is to take a slightly different approach. Instead of attempting to prevent a DoS by limiting one particular read, limit the entire amount of raw data read. In this way you don't need to worry about using special code for every single read and loop, so long as the memory allocated is proportionate to incoming data.

You can either meter the Reader, or probably more appropriately, the undecoded Stream or equivalent.

在梵高的星空下 2024-11-13 05:19:09

有几种方法可以解决这个问题:

  • 如果总体数据量非常小,则将数据从套接字加载到缓冲区(字节数组、字节缓冲区,具体取决于您的喜好),然后将 BufferedReader 包裹在内存中的数据(通过 ByteArrayInputStream 等);
  • 如果发生 OutOfMemoryError,则捕获它;捕获此错误通常不可靠,但在捕获数组分配失败的特定情况下,它基本上是安全的(但不能解决一个线程从堆中分配大量数据可能对其他线程产生任何连锁反应的问题例如,在您的应用程序中运行);
  • 实现一个包装器InputStream,它只会读取这么多字节,然后将其插入到套接字和BufferedReader之间;
  • 抛弃 BufferedReader 并通过正则表达式框架分割行(实现一个 CharSequence,其字符从流中提取,然后定义一个限制行长度的正则表达式);原则上,CharSequence 应该是随机访问的,但对于简单的“行分割”正则表达式,实际上您可能会发现总是请求连续的字符,以便您可以在实现中“作弊”。

There are a few ways round this:

  • if the amount of data overall is very small, load data in from the socket into a buffer (byte array, bytebuffer, depending on what you prefer), then wrap the BufferedReader around the data in memory (via a ByteArrayInputStream etc);
  • just catch the OutOfMemoryError, if it occurs; catching this error is generally not reliable, but in the specific case of catching array allocation failures, it is basically safe (but does not solve the issue of any knock-on effect that one thread allocating large amounts from the heap could have on other threads running in your application, for example);
  • implement a wrapper InputStream that will only read so many bytes, then insert this between the socket and BufferedReader;
  • ditch BufferedReader and split your lines via the regular expressions framework (implement a CharSequence whose chars are pulled from the stream, and then define a regular expression that limits the length of lines); in principle, a CharSequence is supposed to be random access, but for a simple "line splitting" regex, in practice you will probably find that successive chars are always requested, so that you can "cheat" in your implementation.
断舍离 2024-11-13 05:19:09

在 BufferedReader 中,不要使用 String readLine() ,而是使用 int read(char[] cbuf, int off, int len) ;然后,您可以使用 boolean ready() 来查看是否已获取全部内容,并使用构造函数 String(byte[] bytes, int offset, int length) 转换为字符串>。

如果您不关心空格,而只想每行有最大字符数,那么斯蒂芬建议的建议非常简单,

import java.io.BufferedReader;
import java.io.IOException;

public class BoundedReader extends BufferedReader {

    private final int  bufferSize;
    private       char buffer[];

    BoundedReader(final BufferedReader in, final int bufferSize) {
        super(in);
        this.bufferSize = bufferSize;
        this.buffer     = new char[bufferSize];
    }

    @Override
    public String readLine() throws IOException {
        int no;

        /* read up to bufferSize */
        if((no = this.read(buffer, 0, bufferSize)) == -1) return null;
        String input = new String(buffer, 0, no).trim();

        /* skip the rest */
        while(no >= bufferSize && ready()) {
            if((no = read(buffer, 0, bufferSize)) == -1) break;
        }

        return input;
    }

}

编辑:这是为了从用户终端读取行。它会阻塞直到下一行,并返回一个 bufferSize-bounded String;该行上的任何进一步输入都将被丢弃。

In BufferedReader, instead of using String readLine(), use int read(char[] cbuf, int off, int len); you can then use boolean ready() to see if you got it all and convert in into a string using the constructor String(byte[] bytes, int offset, int length).

If you don't care about the whitespace and you just want to have a maximum number of characters per line, then the proposal Stephen suggested is really simple,

import java.io.BufferedReader;
import java.io.IOException;

public class BoundedReader extends BufferedReader {

    private final int  bufferSize;
    private       char buffer[];

    BoundedReader(final BufferedReader in, final int bufferSize) {
        super(in);
        this.bufferSize = bufferSize;
        this.buffer     = new char[bufferSize];
    }

    @Override
    public String readLine() throws IOException {
        int no;

        /* read up to bufferSize */
        if((no = this.read(buffer, 0, bufferSize)) == -1) return null;
        String input = new String(buffer, 0, no).trim();

        /* skip the rest */
        while(no >= bufferSize && ready()) {
            if((no = read(buffer, 0, bufferSize)) == -1) break;
        }

        return input;
    }

}

Edit: this is intended to read lines from a user terminal. It blocks until the next line, and returns a bufferSize-bounded String; any further input on the line is discarded.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文