在文本文件Java中写入大量数据的最快方法
我必须在文本[csv] 文件中写入大量数据。 我使用 BufferedWriter 写入数据,大约需要 40 秒写入 174 MB 的数据。 这是java能提供的最快速度吗?
bufferedWriter = new BufferedWriter ( new FileWriter ( "fileName.csv" ) );
注意:这 40 秒还包括迭代和从结果集中获取记录的时间。 :) 。 174 mb 用于结果集中的 400000 行。
I have to write huge data in text[csv] file. I used BufferedWriter to write the data and it took around 40 secs to write 174 mb of data. Is this the fastest speed java can offer?
bufferedWriter = new BufferedWriter ( new FileWriter ( "fileName.csv" ) );
Note: These 40 secs include the time of iterating and fetching the records from resultset as well. :) . 174 mb is for 400000 rows in resultset.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
您可以尝试删除 BufferedWriter 并直接使用 FileWriter。 在现代系统上,您很有可能只是写入驱动器的缓存内存。
我需要 4-5 秒的时间才能写入 175MB(400 万个字符串)——这是在运行 Windows XP 的双核 2.4GHz 戴尔计算机上,配有 80GB、7200RPM Hitachi 磁盘。
您能否区分出记录检索和文件写入的时间分别占多少时间?
You might try removing the BufferedWriter and just using the FileWriter directly. On a modern system there's a good chance you're just writing to the drive's cache memory anyway.
It takes me in the range of 4-5 seconds to write 175MB (4 million strings) -- this is on a dual-core 2.4GHz Dell running Windows XP with an 80GB, 7200-RPM Hitachi disk.
Can you isolate how much of the time is record retrieval and how much is file writing?
尝试内存映射文件(在我的 m/c、core 2 duo、2.5GB RAM 中写入 174MB 需要 300 m/s):
try memory mapped files (takes 300 m/s to write 174MB in my m/c, core 2 duo, 2.5GB RAM) :
仅出于统计目的:
机器是旧戴尔,配备新 SSD
CPU:Intel Pentium D 2,8 Ghz
SSD:Patriot Inferno 120GB SSD
正如我们所见,原始方法缓冲速度较慢。
Only for the sake of statistics:
The machine is old Dell with new SSD
CPU: Intel Pentium D 2,8 Ghz
SSD: Patriot Inferno 120GB SSD
As we can see the raw method is slower the buffered.
您的传输速度可能不受 Java 的限制。 相反,我会怀疑(没有特定顺序)
的速度如果您读取完整的数据集然后将其写到磁盘,那么这将需要更长的时间,因为 JVM 必须分配内存,数据库读取/磁盘写入将按顺序发生。 相反,我会为从数据库进行的每次读取写入缓冲写入器,因此该操作将更接近并发操作(我不知道您是否这样做)
Your transfer speed is likely not to be limited by Java. Instead I would suspect (in no particular order)
If you read the complete dataset and then write it out to disk, then that will take longer, since the JVM will have to allocate memory, and the db rea/disk write will happen sequentially. Instead I would write out to the buffered writer for every read that you make from the db, and so the operation will be closer to a concurrent one (I don't know if you're doing that or not)
对于从数据库读取的大量数据,您可能需要调整语句的获取大小。 它可能会节省大量与数据库的往返次数。
http:// download.oracle.com/javase/1.5.0/docs/api/java/sql/Statement.html#setFetchSize%28int%29
For these bulky reads from DB you may want to tune your Statement's fetch size. It might save a lot of roundtrips to DB.
http://download.oracle.com/javase/1.5.0/docs/api/java/sql/Statement.html#setFetchSize%28int%29
对于那些想要缩短检索记录并转储到文件中的时间(即不对记录进行处理)的人,不要将它们放入 ArrayList,而是将这些记录附加到 StringBuffer 中。 应用 toSring() 函数获取单个字符串并将其立即写入文件。
对我来说,检索时间从 22 秒减少到 17 秒。
For those who want to improve the time for retrieval of records and dump into the file (i.e no processing on records), instead of putting them into an ArrayList, append those records into a StringBuffer. Apply toSring() function to get a single String and write it into the file at once.
For me, the retrieval time reduced from 22 seconds to 17 seconds.