Java:Javolution:如何正确使用UTF8StreamReader?发生错误原因:java.lang.ArrayIndexOutOfBoundsException: 2048
代码如下:
public static void mergeAllFilesJavolution()throws FileNotFoundException, IOException {
String fileDir = "C:\\TestData\\w12";
File dirSrc = new File(fileDir);
File[] list = dirSrc.listFiles();
long start = System.currentTimeMillis();
for(int j=0; j<list.length; j++){
int chr;
String srcFile = list[j].getPath();
String outFile = fileDir + "\\..\\merged.txt";
UTF8StreamReader inFile=new UTF8StreamReader().setInput(new FileInputStream(srcFile));
UTF8StreamWriter outPut=new UTF8StreamWriter().setOutput(new FileOutputStream(outFile, true));
while((chr=inFile.read()) != -1) {
outPut.write(chr);
}
outPut.close();
inFile.close();
}
System.out.println(System.currentTimeMillis()-start);
}
作为测试数据,utf-8 文件的文件大小为 200MB,但很有可能达到 800MB。
这是 UTF8StreamReader.read() 源代码。
/**
* Holds the bytes buffer.
*/
private final byte[] _bytes;
/**
* Creates a UTF-8 reader having a byte buffer of moderate capacity (2048).
*/
public UTF8StreamReader() {
_bytes = new byte[2048];
}
/**
* Reads a single character. This method will block until a character is
* available, an I/O error occurs or the end of the stream is reached.
*
* @return the 31-bits Unicode of the character read, or -1 if the end of
* the stream has been reached.
* @throws IOException if an I/O error occurs.
*/
public int read() throws IOException {
byte b = _bytes[_start];
return ((b >= 0) && (_start++ < _end)) ? b : read2();
}
错误发生在 _bytes[_start] 处,因为 _bytes = 新字节[2048]。
这是另一个 UTF8StreamReader 构造函数:
/**
* Creates a UTF-8 reader having a byte buffer of specified capacity.
*
* @param capacity the capacity of the byte buffer.
*/
public UTF8StreamReader(int capacity) {
_bytes = new byte[capacity];
}
问题:如何在创建 UTF8StreamReader 时指定 _bytes 的正确容量?
我尝试了 File.length() 但它返回 long 类型(我认为它是正确的,因为我期望巨大的文件大小,但构造函数仅通过 int 类型接收)。
任何关于正确方向的指导都值得赞赏。
Here's the code:
public static void mergeAllFilesJavolution()throws FileNotFoundException, IOException {
String fileDir = "C:\\TestData\\w12";
File dirSrc = new File(fileDir);
File[] list = dirSrc.listFiles();
long start = System.currentTimeMillis();
for(int j=0; j<list.length; j++){
int chr;
String srcFile = list[j].getPath();
String outFile = fileDir + "\\..\\merged.txt";
UTF8StreamReader inFile=new UTF8StreamReader().setInput(new FileInputStream(srcFile));
UTF8StreamWriter outPut=new UTF8StreamWriter().setOutput(new FileOutputStream(outFile, true));
while((chr=inFile.read()) != -1) {
outPut.write(chr);
}
outPut.close();
inFile.close();
}
System.out.println(System.currentTimeMillis()-start);
}
File size of the utf-8 file is 200MB as test data but high possibility of 800MB up.
Here's the UTF8StreamReader.read() source code.
/**
* Holds the bytes buffer.
*/
private final byte[] _bytes;
/**
* Creates a UTF-8 reader having a byte buffer of moderate capacity (2048).
*/
public UTF8StreamReader() {
_bytes = new byte[2048];
}
/**
* Reads a single character. This method will block until a character is
* available, an I/O error occurs or the end of the stream is reached.
*
* @return the 31-bits Unicode of the character read, or -1 if the end of
* the stream has been reached.
* @throws IOException if an I/O error occurs.
*/
public int read() throws IOException {
byte b = _bytes[_start];
return ((b >= 0) && (_start++ < _end)) ? b : read2();
}
The error occurs at _bytes[_start] because the _bytes = new byte[2048].
Here's another UTF8StreamReader constructor:
/**
* Creates a UTF-8 reader having a byte buffer of specified capacity.
*
* @param capacity the capacity of the byte buffer.
*/
public UTF8StreamReader(int capacity) {
_bytes = new byte[capacity];
}
Problem: How can I specified the correct capacity of the _bytes upon UTF8StreamReader creation?
I tried the File.length() but it returns long type (i think its right because I am expecting huge file size but the constructor receiving only by int type).
Any guidance on the right direction is appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
似乎还没有人经历过与上述情况相同的情况。
无论如何,我尝试了其他解决方案,不使用上面的类(UTF8StreamReader)而是使用ByteBuffer(UTF8ByteBufferReader)。它比 StreamReader 快得令人难以置信。
使用 ByteBuffer 更快地合并文件
It seems anybody does not yet experience same with the above situation.
Anyway, I tried other solution by not using the above class (UTF8StreamReader) rather ByteBuffer (UTF8ByteBufferReader). It is incredible faster than StreamReader.
Faster Merging Files by using ByteBuffer