关于在 Java 中将值存储在文件中的建议?

发布于 2024-11-17 09:08:32 字数 226 浏览 3 评论 0原文

我有一个程序,可以生成一个巨大的矩阵,一旦计算出来,我就必须在以后重用它。因此,我想将其缓存到本地硬盘,以便以后读取。我只是通过将数据写入文件然后读取它来使用它。

但是在 java 中执行此类任务时我应该考虑什么特别的事情吗?例如,我是否需要序列化它或者可能做一些特殊的事情。在存储重要的应用程序使用数据时,我应该注意做这样的事情吗?它应该是纯 ASCII/xml 还是什么?
数据并不敏感,但数据的完整性很重要。

I have a program where i generate a huge matrix and once it is calculated, i have to reuse it at later times. For that reason, i want to cache it to the local hard disk so that i can read it at later times. I am using it simply by writing data to file and then later reading it.

But is there anything special that i should take into consideration for doing such tasks in java. For example, do i need to serialize it or may be do something special. Is there something i should take care for doing such things where i store important application usage data. Should it be plain ASCII/xml or what?
The data is not sensitive, however the integrity of the data is important.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

歌入人心 2024-11-24 09:08:33

如果您的数据确实很大,我会推荐一些二进制形式 - 这将使数据更小,读取速度更快,尤其是解析(XML 或 JSON 比读取/写入二进制数据慢很多倍)。序列化也会带来大量开销,因此您可能需要检查 DataInputStream 和 DataOutputStream。如果您知道您将只写入特定类型的数字,或者您知道数据的顺序 - 这些肯定是最快的。

不要忘记用缓冲流包装文件流 - 它们将使您的操作速度提高一个数量级。

类似于(8192是示例缓冲区大小 - 您可以根据您的需要进行调整):

    final File file = null; // get file somehow
    final DataOutputStream dos = new DataOutputStream(
       new BufferedOutputStream(new FileOutputStream(file), 8192));
    try {
        for (int x: ....) { //loop through your matrix (might be different if matrix is sparse)
           for (int y: ....) {
               if (matrix[x,y] != 0.0) {
                   dos.writeInt(x);
                   dos.writeInt(y);
                   dos.writeDouble(matrix[x,y]);                                     
               } 
           }
        }
     } finally {
       dos.writeInt(-1); // mark end (might be done differently)
       dos.close();
     }

和输入:

    final File file = null; // get file somehow
    final DataInputStream dis = new DataInputStream(
      new BufferedInputStream(new FileInputStream(file), 8192));
    try {
        int x;
        while((x = dis.readInt()) != -1) { 
           int y = dis.readInt();
           double value = dis.readDouble();
           // store x,y, value in matrix
        } 
    } finally {
       dis.close();
    }

正如Ryan Amos正确指出的那样,如果矩阵不稀疏,只写入值(但所有值)可能会更快:

出:

    dos.write(xSize);
    dos.write(ySize);
    for (int x=0; x<xSize; x++) {
        for (int y=0; y<ySize; y++) {
            value = matrix[x,y];
            dos.write(value);
        }
    }

入:(

   int xSize = dis.readInt();
   int ySize = dis.readInt();
   for (int x=0; x<xSize; x++) {
        for (int y=0; y<ySize; y++) {
              double value = dis.readDouble();
              matrix[x,y] = value;
        }
   }

请注意,我还没有编译它 - 所以你可能需要纠正一些东西 - 这是我无法想象的)。

如果没有缓冲区,您将逐字节读取,这会使其变慢。

还有一点评论 - 对于如此庞大的数据集,您应该考虑使用 SparseMatrix 并只写入/读取非零元素(除非您确实有那么多重要元素)。

正如上面的评论中所写的 - 如果您确实想写入/读取该大小的矩阵中的每个元素,那么您已经在谈论写入时间而不是秒。

If your data is really huge, I'd recommend some binary form - this will make it smaller and faster to read and especially parse (XML or JSON are many times slower than reading/writing binary data). Serialization also brings a lot of overhead, so you might want to check DataInputStream and DataOutputStream. If you know you will be writing only numbers of specific type or you know what sequence the data will be in - these are certainly the fastest ones.

Do not forget to wrap File Streams with Buffered Streams - they will make your operations order of magnitude faster still.

Something like (8192 is example buffer size- you can tailor it to your needs):

    final File file = null; // get file somehow
    final DataOutputStream dos = new DataOutputStream(
       new BufferedOutputStream(new FileOutputStream(file), 8192));
    try {
        for (int x: ....) { //loop through your matrix (might be different if matrix is sparse)
           for (int y: ....) {
               if (matrix[x,y] != 0.0) {
                   dos.writeInt(x);
                   dos.writeInt(y);
                   dos.writeDouble(matrix[x,y]);                                     
               } 
           }
        }
     } finally {
       dos.writeInt(-1); // mark end (might be done differently)
       dos.close();
     }

and input:

    final File file = null; // get file somehow
    final DataInputStream dis = new DataInputStream(
      new BufferedInputStream(new FileInputStream(file), 8192));
    try {
        int x;
        while((x = dis.readInt()) != -1) { 
           int y = dis.readInt();
           double value = dis.readDouble();
           // store x,y, value in matrix
        } 
    } finally {
       dis.close();
    }

as correctly pointed out by Ryan Amos, in case matrix is not sparse, it could be faster to just write values (but all of them):

Out:

    dos.write(xSize);
    dos.write(ySize);
    for (int x=0; x<xSize; x++) {
        for (int y=0; y<ySize; y++) {
            value = matrix[x,y];
            dos.write(value);
        }
    }

In:

   int xSize = dis.readInt();
   int ySize = dis.readInt();
   for (int x=0; x<xSize; x++) {
        for (int y=0; y<ySize; y++) {
              double value = dis.readDouble();
              matrix[x,y] = value;
        }
   }

(mind I have not compiled it - so you might need to correct some stuff - it is out of the top of my head).

Without buffers, you will read byte by byte which will make it slow.

One more comment - with such a huge dataset, you should consider using SparseMatrix and write/read only the elements which are non-zero (unless you really have that many of significant elements).

As wrote in the comment above - if you really want to write/read every single element in the matrix of that size, then you are already talking about hours of write rather than seconds.

孤凫 2024-11-24 09:08:33

您有几种存储数据的选项。您可以尝试简单地在标头中说明宽度,然后将所有内容放入带有分隔符的列表中(例如 '\n','\t',' ' 等)。否则,您可以使用特殊的 ObjectOutputStream 来存储数据。请注意:这可能比您的解决方案效率更低。然而,它会更容易使用。

除此之外,您可以自由地做您选择的事情。我通常使用 FileWriter 并以纯文本形式写入所有数据。如果您追求超高效率,FileOutputStream 就是您所需要的。

You have a few options for storing your data. You can try simply stating in a header what the width is and throwing everything into a list with a separator (ex '\n','\t',' ',etc.). Otherwise, you can use the special ObjectOutputStream to store your data. Be wary: this will likely be more inefficient than your solution. However, it will be easier to use.

Other than that, you're free to do as you choose. I usually use a FileWriter and just write all of my data in plaintext. If you're for super-efficiency, FileOutputStream is what you need.

谈下烟灰 2024-11-24 09:08:33

如果您的条目是数字,那么您可以将矩阵的每一行保存为文件中的一行,并用一些分隔符分隔。那么你不需要特殊的序列化。 :)

If your entries are numbers then you could just save each row of your matrix as a line in your file separated by some delimiter. You don't need special serialization then. :)

大姐,你呐 2024-11-24 09:08:33

这完全取决于您稍后如何输出它,或者您是否还将它存储在数据库或其他地方。如果您从不输出它或将其存储在其他地方,那么文本文件就可以了。

It all depends on how you'll output it later, or if you'll also be storing it in a database or somewhere else as well. If you're never outputting it or storing it anywhere else, then a text file would work.

热情消退 2024-11-24 09:08:33

如果不需要保存数据(即在java程序终止后保留它),那么将其保存在内存中的Java变量中会更快。有很多类型可以满足您的要求(hashmap、arraylist...)。
如果需要保留数据以便在后续程序执行中使用,可以使用标准文件读/写方法将其存储在文件中。纯 ASCII 的读/写速度比 XML 更快。关于文件的完整性,它与操作系统相关,因为最终这将是本地文件系统上的文件。

If there's no need to persist the data (i.e. keep it after the java program is terminated) it would be faster to keep it in-memory in a Java variable. There are a lot of types that should meet your requirements (hashmap, arraylist...).
If you need to keep the data to use it in subsequent program executions, you can store it in a file using standard file read/write methods. Plain ASCII would be faster to read/write than XML. Regarding the integrity of the files, it is OS related, because -at the end- that would be a file on your local filesystem.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文