在Java中将浮点数组写入文件

发布于 2024-12-04 12:33:55 字数 263 浏览 3 评论 0原文

我正在读取 NetCDF 文件,我想将每个数组作为浮点数组读取,然后将浮点数组写入新文件。如果我读取浮点数组,然后迭代数组中的每个元素(使用 DataOutputStream),我可以使其工作,但这非常非常慢,我的 NetCDF 文件超过 1GB。

我尝试使用 ObjectOutputStream,但这会写入额外的信息字节。

所以,回顾一下。 1.打开NetCDF文件 2.从NetCDF文件中读取浮点数组x 3.一步将浮点数组x写入原始数据文件 4. 用 x+1 重复步骤 2

I'm reading in a NetCDF file and I want to read in each array as a float array and then write the float array to a new file. I can make it work if I read in the float array and then iterate over each element in the array (using a DataOutputStream), but this is very, very slow, my NetCDF files are over 1GB.

I tried using an ObjectOutputStream, but this writes extra bytes of information.

So, to recap.
1. Open NetCDF file
2. Read float array x from NetCDF file
3. Write float array x to raw data file in a single step
4. Repeat step 2 with x+1

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

女中豪杰 2024-12-11 12:33:55

好的,您有 1 GB 可供读取,1 GB 可供写入。根据您的硬盘驱动器,您可能会获得大约 100 MB/秒的读取速度和 60 MB/秒的写入速度。这意味着读取和写入大约需要 27 秒。

您的驾驶速度是多少?您看到的速度比这个速度慢多少?

如果您想在不进行任何处理的情况下测试磁盘速度,请计算复制最近未访问过的文件(即不在磁盘缓存中)所需的时间,这将使您了解您的最小延迟可以期望读取然后写入文件中的大部分数据(即不涉及任何处理或 Java),这


对于任何想知道如何进行无循环数据复制的人来说都是有利的,即它不仅仅调用一个方法为你循环播放。

FloatBuffer src = // readable memory mapped file.
FloatByffer dest = // writeable memory mapped file.
src.position(start);
src.limit(end);
dest.put(src);

如果您有混合类型的数据,则可以使用 ByteBuffer,它理论上一次复制一个字节,但实际上可以使用 long 或更宽的类型一次复制 8 个或更多字节。即CPU 能做什么。

对于小块,这将使用循环,但对于大块,它可以使用操作系统中的页面映射技巧。无论如何,Java 中没有定义它是如何实现的,但它可能是复制数据的最快方法。

大多数这些技巧只有在将内存中已有的文件复制到缓存文件时才会产生影响。一旦您从磁盘读取文件或文件太大而无法缓存,物理磁盘的 IO 带宽就是唯一真正重要的事情。

这是因为 CPU 可以以 6 GB/s 的速度将数据复制到主内存,但只能以 60-100 MB/s 的速度复制到硬​​盘。如果 CPU/内存中的副本比实际速度慢 2 倍、10 倍或 50 倍,它仍然会等待磁盘。注意:如果没有缓冲,这是完全可能的,而且更糟,但只要有任何简单的缓冲,CPU 将比磁盘更快。

Ok, You have 1 GB to read and 1 GB to write. Depending on your hard drive, you might get about 100 MB/s read and 60 MB/s write speed. This means it will take about 27 seconds to read and write.

What is the speed of your drive and how much slower than this are you seeing?

If you want to test the speed of your disk without any processing, time how long it takes to copy a file which you haven't accessed recently (i.e. it is not in disk cache) This will give you an idea of the minimum delay you can expect to read then write most of the data from the file (i.e. with no processing or Java involved)


For the benefit of anyone who would like to know how to do a loop less copy of data i.e. it doesn't just call a method which loops for you.

FloatBuffer src = // readable memory mapped file.
FloatByffer dest = // writeable memory mapped file.
src.position(start);
src.limit(end);
dest.put(src);

If you have mixed types of data you can use ByteBuffer which notionally copies a byte at a time but in reality could use long or wider type to copy 8 or more bytes at a time. i.e. whatever the CPU can do.

For small blocks this will use a loop but for large blocks it can use page mapping tricks in the OS. In any case, how it does it is not defined in Java, but its likely to be the fastest way to copy data.

Most of these tricks only make a difference if you are copying file already in memory to a cached file. As soon as you read a file from disk or the file is too large to cache the IO bandwidth of the your physical disk is the only thing which really matters.

This is because a CPU can copy data at 6 GB/s to main memory but only 60-100 MB/s to a hard drive. If the copy in the CPU/memory is 2x, 10x or 50x slower than it could be, it will still be waiting for the disk. Note: with no buffering this is entirely possible and worse, but provided you have any simple buffering the CPU will be faster than the disk.

不回头走下去 2024-12-11 12:33:55

我遇到了同样的问题,并将我的解决方案放在这里仅供将来参考。

迭代浮点数数组并为每个浮点数调用 DataOutputStream.writeFloat 的速度非常慢。相反,您可以将浮点数转换为字节数组,然后一次性写入该数组:

慢:

DataOutputStream out = ...;
for (int i=0; i<floatarray.length; ++i)
    out.writeFloat(floatarray[i]);

快得多

DataOutputStream out = ...;
byte buf[] = new byte[4*floatarray.length];
for (int i=0; i<floatarray.length; ++i)
{
    int val = Float.floatToRawIntBits(probs[i]);
    buf[4 * i] = (byte) (val >> 24);
    buf[4 * i + 1] = (byte) (val >> 16) ;
    buf[4 * i + 2] = (byte) (val >> 8);
    buf[4 * i + 3] = (byte) (val);
}

out.write(buf);

如果您的数组非常大(> 100k),请将其分成块以避免缓冲区数组造成堆溢出。

I ran into the same problem and will dump my solution here just for future refrerence.

It is very slow to iterate over an array of floats and calling DataOutputStream.writeFloat for each of them. Instead, transform the floats yourself into a byte array and write that array all at once:

Slow:

DataOutputStream out = ...;
for (int i=0; i<floatarray.length; ++i)
    out.writeFloat(floatarray[i]);

Much faster

DataOutputStream out = ...;
byte buf[] = new byte[4*floatarray.length];
for (int i=0; i<floatarray.length; ++i)
{
    int val = Float.floatToRawIntBits(probs[i]);
    buf[4 * i] = (byte) (val >> 24);
    buf[4 * i + 1] = (byte) (val >> 16) ;
    buf[4 * i + 2] = (byte) (val >> 8);
    buf[4 * i + 3] = (byte) (val);
}

out.write(buf);

If your array is very large (>100k), break it up into chunks to avoid heap overflow with the buffer array.

1)写入时,使用BufferedOutputStream,你将获得100倍的加速。

2)读的时候,每次读至少10K,大概100K比较好。

3)发布你的代码。

1) when writing, use BufferedOutputStream, you will get a factor of 100 speedup.

2) when reading, read at least 10K per read, probably 100K is better.

3) post your code.

撩起发的微风 2024-12-11 12:33:55

如果您使用 Unidata NetCDF 库,您的问题可能不是写作,而是 NetCDF 库缓存机制。

     NetcdfFile file = NetcdfFile.open(filename);
     Variable variable = openFile.findVariable(variable name);
     for (...) {
          read data
          variable.invalidateCache();
      }

If you are using the Unidata NetCDF library your problem may not be the writing, but rather the NetCDF libraries caching mechanism.

     NetcdfFile file = NetcdfFile.open(filename);
     Variable variable = openFile.findVariable(variable name);
     for (...) {
          read data
          variable.invalidateCache();
      }
君勿笑 2024-12-11 12:33:55

横向解决方案:

如果这是一次性生成(或者如果您愿意在 Ant 脚本中将其自动化)并且您可以访问某种 Unix 环境,则可以使用 NCDUMP 而不是用 Java 进行。类似于:

ncdump -v your_variable your_file.nc | [awk] > float_array.txt

如果您愿意,您可以使用 -p 选项控制浮点数的精度。我刚刚在 3GB NetCDF 文件上运行它,效果很好。尽管我非常喜欢 Java,但这可能是完成您想要的任务的最快方法。

Lateral solution:

If this is a one-off generation (or if you are willing to automate it in an Ant script) and you have access to some kind of Unix environment, you can use NCDUMP instead of doing it in Java. Something like:

ncdump -v your_variable your_file.nc | [awk] > float_array.txt

You can control the precision of the floats with the -p option if you desire. I just ran it on a 3GB NetCDF file and it worked fine. As much as I love Java, this is probably the quickest way to do what you want.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文