Java:Javolution:如何正确使用UTF8StreamReader?发生错误原因:java.lang.ArrayIndexOutOfBoundsException: 2048

发布于 2024-11-16 13:36:06 字数 2182 浏览 4 评论 0原文

代码如下:

public static void mergeAllFilesJavolution()throws FileNotFoundException, IOException {
    String fileDir = "C:\\TestData\\w12";
    File dirSrc = new File(fileDir);
    File[] list = dirSrc.listFiles();
    long start = System.currentTimeMillis();
    for(int j=0; j<list.length; j++){
        int chr;
        String srcFile = list[j].getPath();
        String outFile = fileDir + "\\..\\merged.txt";
        UTF8StreamReader inFile=new UTF8StreamReader().setInput(new FileInputStream(srcFile));
        UTF8StreamWriter outPut=new UTF8StreamWriter().setOutput(new FileOutputStream(outFile, true)); 
        while((chr=inFile.read()) != -1) {
            outPut.write(chr);
        }
        outPut.close();
        inFile.close();
    }
    System.out.println(System.currentTimeMillis()-start);
}

作为测试数据,utf-8 文件的文件大小为 200MB,但很有可能达到 800MB

这是 UTF8StreamReader.read() 源代码。

/**
 * Holds the bytes buffer.
 */
private final byte[] _bytes;

/**
 * Creates a UTF-8 reader having a byte buffer of moderate capacity (2048).
 */
public UTF8StreamReader() {
    _bytes = new byte[2048];
}

/**
 * Reads a single character.  This method will block until a character is
 * available, an I/O error occurs or the end of the stream is reached.
 *
 * @return the 31-bits Unicode of the character read, or -1 if the end of
 *         the stream has been reached.
 * @throws IOException if an I/O error occurs.
 */
public int read() throws IOException {
    byte b = _bytes[_start];
    return ((b >= 0) && (_start++ < _end)) ? b : read2();
}

错误发生在 _bytes[_start] 处,因为 _bytes = 新字节[2048]。

这是另一个 UTF8StreamReader 构造函数:

/**
 * Creates a UTF-8 reader having a byte buffer of specified capacity.
 * 
 * @param capacity the capacity of the byte buffer.
 */
public UTF8StreamReader(int capacity) {
    _bytes = new byte[capacity];
}

问题:如何在创建 UTF8StreamReader 时指定 _bytes 的正确容量

尝试了 File.length() 但它返回 long 类型(我认为它是正确的,因为我期望巨大的文件大小,但构造函数仅通过 int 类型接收)。

任何关于正确方向的指导都值得赞赏。

Here's the code:

public static void mergeAllFilesJavolution()throws FileNotFoundException, IOException {
    String fileDir = "C:\\TestData\\w12";
    File dirSrc = new File(fileDir);
    File[] list = dirSrc.listFiles();
    long start = System.currentTimeMillis();
    for(int j=0; j<list.length; j++){
        int chr;
        String srcFile = list[j].getPath();
        String outFile = fileDir + "\\..\\merged.txt";
        UTF8StreamReader inFile=new UTF8StreamReader().setInput(new FileInputStream(srcFile));
        UTF8StreamWriter outPut=new UTF8StreamWriter().setOutput(new FileOutputStream(outFile, true)); 
        while((chr=inFile.read()) != -1) {
            outPut.write(chr);
        }
        outPut.close();
        inFile.close();
    }
    System.out.println(System.currentTimeMillis()-start);
}

File size of the utf-8 file is 200MB as test data but high possibility of 800MB up.

Here's the UTF8StreamReader.read() source code.

/**
 * Holds the bytes buffer.
 */
private final byte[] _bytes;

/**
 * Creates a UTF-8 reader having a byte buffer of moderate capacity (2048).
 */
public UTF8StreamReader() {
    _bytes = new byte[2048];
}

/**
 * Reads a single character.  This method will block until a character is
 * available, an I/O error occurs or the end of the stream is reached.
 *
 * @return the 31-bits Unicode of the character read, or -1 if the end of
 *         the stream has been reached.
 * @throws IOException if an I/O error occurs.
 */
public int read() throws IOException {
    byte b = _bytes[_start];
    return ((b >= 0) && (_start++ < _end)) ? b : read2();
}

The error occurs at _bytes[_start] because the _bytes = new byte[2048].

Here's another UTF8StreamReader constructor:

/**
 * Creates a UTF-8 reader having a byte buffer of specified capacity.
 * 
 * @param capacity the capacity of the byte buffer.
 */
public UTF8StreamReader(int capacity) {
    _bytes = new byte[capacity];
}

Problem: How can I specified the correct capacity of the _bytes upon UTF8StreamReader creation?

I tried the File.length() but it returns long type (i think its right because I am expecting huge file size but the constructor receiving only by int type).

Any guidance on the right direction is appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

仅一夜美梦 2024-11-23 13:36:06

似乎还没有人经历过与上述情况相同的情况。

无论如何,我尝试了其他解决方案,不使用上面的类(UTF8StreamReader)而是使用ByteBuffer(UTF8ByteBufferReader)。它比 StreamReader 快得令人难以置信。

使用 ByteBuffer 更快地合并文件

It seems anybody does not yet experience same with the above situation.

Anyway, I tried other solution by not using the above class (UTF8StreamReader) rather ByteBuffer (UTF8ByteBufferReader). It is incredible faster than StreamReader.

Faster Merging Files by using ByteBuffer

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文