比 Scanner 或 BufferedReader 从 STDIN 读取多行数据更快的方法?

发布于 2024-10-20 03:42:12 字数 1028 浏览 4 评论 0原文

注意:我目前正在用java编码。我希望将输入数据读入字符串,一次一行(或更多行),并且我期望总共有很多行。

现在我已经实现了,

scanner in = new Scanner(System.in)
while (in.hasNextLine()) {
    separated = in.nextLine().split(" ");
    ...
}

因为在行内我的输入是空格分隔的。

不幸的是,由于有数百万行,这个过程非常慢,而且扫描仪占用的时间比我的数据处理时间还要多,所以我研究了 java.io 库,发现了一堆可能性,但我不确定该使用哪一个( ByteArrayInputStreamFileInputStreamBufferedInputStreamPipedInputStream)。我应该使用哪一个?

具体来说,我的数据是从文本文件通过管道传入的,每行都有 4 或 6 个以换行符结尾的单词,我需要一次分析一行,将(4 或 6)个单词设置为一个数组我可以暂时管理。 数据格式:

392903840 a c b 293 32.90
382049804 a c 390
329084203 d e r 489 384.90
...

有没有一种方法可以让扫描仪一次读取 1000 行左右并变得高效,或者我应该使用其中哪种数据类型(以最小化速度)?

旁注:在实验时我尝试过:

java.io.BufferedReader stdin = new java.io.BufferedReader(new java.io.InputStreamReader(System.in));
while(in.ready()){
    separated = in.readLine().split(" ");
    ...
}

哪个效果很好,只是想知道哪个效果最好,以及是否有任何方法可以一次将 100 行读入数据然后处理所有内容。寻找最佳解决方案的选项太多。

Note: I am currently coding in java. I am looking to read input data into a string, one line at a time (or more), and I expect a lot of total lines.

Right now I have implemented

scanner in = new Scanner(System.in)
while (in.hasNextLine()) {
    separated = in.nextLine().split(" ");
    ...
}

because within the line my inputs are space delimited.

Unfortunately, with millions of lines this process is VERY slow and he scanner is taking up more time than my data processing, so I looked into the java.io libraries and found a bunch of possibilities and I'm not sure which one to use (ByteArrayInputStream, FileInputStream, BufferedInputStream, PipedInputStream). Which one should I use?

To specify, my data is being piped in from a text file, every line has either 4 or 6 words ended by a newline character, and I need to analyze one line at a time, setting the (4 or 6) words to an array which I can temporarily manage.
Data format:

392903840 a c b 293 32.90
382049804 a c 390
329084203 d e r 489 384.90
...

Is there a way where scanner can read 1000 or so lines at a time and become efficient or which of these datatypes should I use(to minimize speed)?

Sidenote: while experimenting I have tried:

java.io.BufferedReader stdin = new java.io.BufferedReader(new java.io.InputStreamReader(System.in));
while(in.ready()){
    separated = in.readLine().split(" ");
    ...
}

Which worked well, just wondering which one works best, and if there's any way to, say, read 100 lines into data at once then process everything. Too many options looking for the optimal solution.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

鸢与 2024-10-27 03:42:12

您应该使用 BufferInputStream 包装您的 System.in,如下所示:

BufferedInputStream bis = new BufferedInputStream(System.in);
Scanner in = new Scanner(bis);

因为这可以最大限度地减少对 System.in 的读取量,从而提高效率(BufferedInputStream)。

另外,如果您只是读取行,那么您实际上并不需要扫描仪,而是读取器(它具有 readLine()ready() 方法来获取新行并查看是否还有更多数据要读取)。

您可以这样使用它(请参阅 java6 的示例:InputStreamReader ):(

我向 BufferedReader 添加了 32MB 的缓存大小参数)

BufferedReader br = new BufferedReader(new InputStreamReader(System.in), 32*1024*1024);
while (br.ready()) {
    String line = br.readLine();
    // process line
}

来自 InputStreamReader 文档页面:

没有缓冲,每次调用
read() 或 readLine() 可能会导致字节
从文件中读取并转换
转化为字符,然后返回,
这可能非常低效。

You should wrap your System.in with a BufferInputStream like:

BufferedInputStream bis = new BufferedInputStream(System.in);
Scanner in = new Scanner(bis);

because this minimises the amount of reads to System.in which raises efficiency (the BufferedInputStream).

Also, if you're only reading lines, you don't really need a Scanner, but a Reader (which has readLine() and ready() methods to get a new line and see if there's any more data to be read).

You would use it as such (see example at java6 : InputStreamReader):

(I added a cache size argument of 32MB to BufferedReader)

BufferedReader br = new BufferedReader(new InputStreamReader(System.in), 32*1024*1024);
while (br.ready()) {
    String line = br.readLine();
    // process line
}

From the InputStreamReader doc page:

Without buffering, each invocation of
read() or readLine() could cause bytes
to be read from the file, converted
into characters, and then returned,
which can be very inefficient.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文