Java:在小程序中将pdf文件从URL读取到字节数组/ByteBuffer中

发布于 2024-07-14 11:31:31 字数 1640 浏览 6 评论 0原文

我试图找出为什么这个特定的代码片段对我不起作用。 我有一个小程序,它应该读取 .pdf 并使用 pdf 渲染器库显示它,但由于某种原因,当我读取服务器上的 .pdf 文件时,它们最终被损坏。 我已经通过再次写回文件来测试它。

我尝试在 IE 和 Firefox 中查看该小程序,但出现了损坏的文件。 有趣的是,当我尝试在 Safari(适用于 Windows)中查看小程序时,文件实际上没问题! 我知道 JVM 可能会有所不同,但我仍然迷失方向。 我是用Java 1.5编译的。 JVM 是 1.6。 读取文件的片段如下。

public static ByteBuffer getAsByteArray(URL url) throws IOException {
        ByteArrayOutputStream tmpOut = new ByteArrayOutputStream();

        URLConnection connection = url.openConnection();
        int contentLength = connection.getContentLength();
        InputStream in = url.openStream();
        byte[] buf = new byte[512];
        int len;
        while (true) {
            len = in.read(buf);
            if (len == -1) {
                break;
            }
            tmpOut.write(buf, 0, len);
        }
        tmpOut.close();
        ByteBuffer bb = ByteBuffer.wrap(tmpOut.toByteArray(), 0,
                                        tmpOut.size());
        //Lines below used to test if file is corrupt
        //FileOutputStream fos = new FileOutputStream("C:\\abc.pdf");
        //fos.write(tmpOut.toByteArray());
        return bb;
}

我一定错过了一些东西,我一直在努力想弄清楚。 任何帮助是极大的赞赏。 谢谢。


编辑: 为了进一步澄清我的情况,我阅读之前的文件与代码片段和之后的区别在于,我阅读后输出的文件明显小于原来的大小。 打开它们时,它们不会被识别为 .pdf 文件。 没有任何异常被抛出,我忽略了,我尝试冲洗但无济于事。

此代码片段适用于 Safari,这意味着文件将被完整读取,大小没有差异,并且可以使用任何 .pdf 阅读器打开。 在 IE 和 Firefox 中,文件最终总是会被损坏,并且大小始终相同。

我监视了 len 变量(当读取 59kb 文件时),希望看到每个循环读取了多少字节。 在 IE 和 Firefox 中,在 18kb 处,in.read(buf) 返回 -1,就好像文件已结束一样。 Safari 不这样做。

我会坚持下去,并且感谢迄今为止的所有建议。

I'm trying to figure out why this particular snippet of code isn't working for me. I've got an applet which is supposed to read a .pdf and display it with a pdf-renderer library, but for some reason when I read in the .pdf files which sit on my server, they end up as being corrupt. I've tested it by writing the files back out again.

I've tried viewing the applet in both IE and Firefox and the corrupt files occur. Funny thing is, when I trying viewing the applet in Safari (for Windows), the file is actually fine! I understand the JVM might be different, but I am still lost. I've compiled in Java 1.5. JVMs are 1.6. The snippet which reads the file is below.

public static ByteBuffer getAsByteArray(URL url) throws IOException {
        ByteArrayOutputStream tmpOut = new ByteArrayOutputStream();

        URLConnection connection = url.openConnection();
        int contentLength = connection.getContentLength();
        InputStream in = url.openStream();
        byte[] buf = new byte[512];
        int len;
        while (true) {
            len = in.read(buf);
            if (len == -1) {
                break;
            }
            tmpOut.write(buf, 0, len);
        }
        tmpOut.close();
        ByteBuffer bb = ByteBuffer.wrap(tmpOut.toByteArray(), 0,
                                        tmpOut.size());
        //Lines below used to test if file is corrupt
        //FileOutputStream fos = new FileOutputStream("C:\\abc.pdf");
        //fos.write(tmpOut.toByteArray());
        return bb;
}

I must be missing something, and I've been banging my head trying to figure it out. Any help is greatly appreciated. Thanks.


Edit:
To further clarify my situation, the difference in the file before I read then with the snippet and after, is that the ones I output after reading are significantly smaller than they originally are. When opening them, they are not recognized as .pdf files. There are no exceptions being thrown that I ignore, and I have tried flushing to no avail.

This snippet works in Safari, meaning the files are read in it's entirety, with no difference in size, and can be opened with any .pdf reader. In IE and Firefox, the files always end up being corrupted, consistently the same smaller size.

I monitored the len variable (when reading a 59kb file), hoping to see how many bytes get read in at each loop. In IE and Firefox, at 18kb, the in.read(buf) returns a -1 as if the file has ended. Safari does not do this.

I'll keep at it, and I appreciate all the suggestions so far.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

夜未央樱花落 2024-07-21 11:31:32

为了防止这些小变化产生影响,请尝试以下操作:

public static ByteBuffer getAsByteArray(URL url) throws IOException {
    URLConnection connection = url.openConnection();
    // Since you get a URLConnection, use it to get the InputStream
    InputStream in = connection.getInputStream();
    // Now that the InputStream is open, get the content length
    int contentLength = connection.getContentLength();

    // To avoid having to resize the array over and over and over as
    // bytes are written to the array, provide an accurate estimate of
    // the ultimate size of the byte array
    ByteArrayOutputStream tmpOut;
    if (contentLength != -1) {
        tmpOut = new ByteArrayOutputStream(contentLength);
    } else {
        tmpOut = new ByteArrayOutputStream(16384); // Pick some appropriate size
    }

    byte[] buf = new byte[512];
    while (true) {
        int len = in.read(buf);
        if (len == -1) {
            break;
        }
        tmpOut.write(buf, 0, len);
    }
    in.close();
    tmpOut.close(); // No effect, but good to do anyway to keep the metaphor alive

    byte[] array = tmpOut.toByteArray();

    //Lines below used to test if file is corrupt
    //FileOutputStream fos = new FileOutputStream("C:\\abc.pdf");
    //fos.write(array);
    //fos.close();

    return ByteBuffer.wrap(array);
}

您忘记关闭 fos,如果您的应用程序仍在运行或突然终止,这可能会导致该文件变短。 另外,我还添加了创建具有适当初始大小的 ByteArrayOutputStream 。 (否则Java将不得不重复分配一个新数组并复制,分配一个新数组并复制,这是昂贵的。)将值16384替换为更合适的值。 16k 对于 PDF 来说可能有点小,但我不知道如何,但“平均”大小是您期望下载的。

由于您使用了 toByteArray() 两次(即使一次在诊断代码中),我将其分配给一个变量。 最后,虽然应该没有任何区别,但当您将整个数组包装在 ByteBuffer 中时,您只需要提供字节数组本身。 提供偏移量 0 和长度是多余的。

请注意,如果您以这种方式下载大型 PDF 文件,请确保您的 JVM 正在运行足够大的堆,以便您有足够的空间容纳您期望读取的最大文件大小的几倍。 您使用的方法将整个文件保留在内存中,只要您负担得起该内存,就可以。 :)

Just in case these small changes make a difference, try this:

public static ByteBuffer getAsByteArray(URL url) throws IOException {
    URLConnection connection = url.openConnection();
    // Since you get a URLConnection, use it to get the InputStream
    InputStream in = connection.getInputStream();
    // Now that the InputStream is open, get the content length
    int contentLength = connection.getContentLength();

    // To avoid having to resize the array over and over and over as
    // bytes are written to the array, provide an accurate estimate of
    // the ultimate size of the byte array
    ByteArrayOutputStream tmpOut;
    if (contentLength != -1) {
        tmpOut = new ByteArrayOutputStream(contentLength);
    } else {
        tmpOut = new ByteArrayOutputStream(16384); // Pick some appropriate size
    }

    byte[] buf = new byte[512];
    while (true) {
        int len = in.read(buf);
        if (len == -1) {
            break;
        }
        tmpOut.write(buf, 0, len);
    }
    in.close();
    tmpOut.close(); // No effect, but good to do anyway to keep the metaphor alive

    byte[] array = tmpOut.toByteArray();

    //Lines below used to test if file is corrupt
    //FileOutputStream fos = new FileOutputStream("C:\\abc.pdf");
    //fos.write(array);
    //fos.close();

    return ByteBuffer.wrap(array);
}

You forgot to close fos which may result in that file being shorter if your application is still running or is abruptly terminated. Also, I added creating the ByteArrayOutputStream with the appropriate initial size. (Otherwise Java will have to repeatedly allocate a new array and copy, allocate a new array and copy, which is expensive.) Replace the value 16384 with a more appropriate value. 16k is probably small for a PDF, but I don't know how but the "average" size is that you expect to download.

Since you use toByteArray() twice (even though one is in diagnostic code), I assigned that to a variable. Finally, although it shouldn't make any difference, when you are wrapping the entire array in a ByteBuffer, you only need to supply the byte array itself. Supplying the offset 0 and the length is redundant.

Note that if you are downloading large PDF files this way, then ensure that your JVM is running with a large enough heap that you have enough room for several times the largest file size you expect to read. The method you're using keeps the whole file in memory, which is OK as long as you can afford that memory. :)

夕色琉璃 2024-07-21 11:31:32

我以为我和你有同样的问题,但事实证明我的问题是我假设你总是得到完整的缓冲区,直到你什么也得不到。 但你不这么认为。
网上的示例(例如 java2s/tutorial)使用 BufferedInputStream 。 但这对我来说没有任何区别。

您可以检查循环中是否确实获得了完整文件。 问题出在 ByteArrayOutputStream 中。

I thought I had the same problem as you, but it turned out my problem was that I assumed you always get the full buffer until you get nothing. But you do not assume that.
The examples on the net (e.g. java2s/tutorial) use a BufferedInputStream. But that does not make any difference for me.

You could check whether you actually get the full file in your loop. Than the problem would be in the ByteArrayOutputStream.

凑诗 2024-07-21 11:31:32

在关闭 tmpOut 流之前,您是否尝试过使用 flush() 以确保写出所有字节?

Have you tried a flush() before you close the tmpOut stream to ensure all bytes written out?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文