使用 Java Scanner 从控制台读取很长的字符串需要时间吗?

发布于 2024-12-17 13:49:22 字数 880 浏览 2 评论 0原文

目前我正在创建一个控制台程序,它使用 java Scanner 示例数据读取一行非常长的字符串

,更像是

一行中用空格分隔的 50000 个整数,

"11 23 34 103 999 381 ....." until 50000 integer

文件

该数据是由用户通过控制台输入的,而不是来自此处的 我的代码

        System.out.print("Input of integers : ");
        Scanner sc = new Scanner(System.in);
        long start = System.currentTimeMillis();

        String Z = sc.nextLine();

        long end = System.currentTimeMillis();
        System.out.println("String Z created in "+(end-start)+"ms, Z character length is "+Z.length()+" characters");

然后我执行,结果我得到了这个

String Z created within 49747ms, Z character length is 194539 characters

我的问题是为什么需要很长时间? 有没有更快的方法来读取很长的字符串?

我尝试过缓冲阅读器,但差别不大。

String Z created within 41881ms, Z character length is 194539 characters

Currently I'm creating a console program that read a one line with very long String with java Scanner

sample data is more like this

50000 integer in one line separated by white-space,

"11 23 34 103 999 381 ....." until 50000 integer

This data is entered by user via console not from a File

here's my code

        System.out.print("Input of integers : ");
        Scanner sc = new Scanner(System.in);
        long start = System.currentTimeMillis();

        String Z = sc.nextLine();

        long end = System.currentTimeMillis();
        System.out.println("String Z created in "+(end-start)+"ms, Z character length is "+Z.length()+" characters");

Then I execute, as the result I've got this

String Z created within 49747ms, Z character length is 194539 characters

My question is why it takes a long time?
Is there any faster way to read a very long string?

I have tried buffered reader, but not much different..

String Z created within 41881ms, Z character length is 194539 characters

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

≈。彩虹 2024-12-24 13:49:22

看起来扫描仪使用正则表达式来匹配行尾 - 这可能会导致效率低下,特别是因为您将正则表达式与 200k 长度的字符串进行匹配。

使用的模式实际上是 .*(\r\n|[\n\r\u2028\u2029\u0085])|.+$

It looks like scanner uses a regular expression to match the end of line - this is likely causing the inefficiency, especially since you're matching regex against a 200k length String.

The pattern used is, effectively, .*(\r\n|[\n\r\u2028\u2029\u0085])|.+$

云胡 2024-12-24 13:49:22

我的猜测是内存分配,当它读取该行时,它会填充字符缓冲区。而且它变得越来越大,需要一遍又一遍地复制所有迄今为止阅读过的文本。每次它都会使内部缓冲区变大N倍,所以它并不是慢得可怕,但对于你的巨大线路来说,它仍然很慢。

正则表达式本身的处理也没有帮助。但我的猜测是,重新分配和复制是减速的根源。

也许它需要进行 GC 来释放内存来获取,所以又会减慢速度。

您可以通过复制 Scanner 并将 BUFFER_SIZE 更改为等于您的行长度(或更大,当然)来测试我的假设。

My guess would be memory allocation, as it reads the line, it fills char buffer. And it gets larger and larger and needs to copy all so far readed text again and again. Each time it makes internal buffer Ntimes larger, so it is not atrociously slow, but for your huge line, it still is slow.

And processing of regexp itself does not help too. But my guess is that realocation and copying is the source of slowdown.

And maybe it needs to do GC to free memory to aquire, so another slowdown.

You can test my hypothesis by copying of Scanner and changing BUFFER_SIZE to equal your line length (or larger, to be sure).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文