Java:多维数组 - 哪个维度在前?

发布于 2024-12-14 11:24:50 字数 1367 浏览 0 评论 0原文

java中这两个数组之间是否有任何性能差异(可能与J2ME开发相关...):

String[][][] a = //for getting an entry by a[group][person][field]
    {
        { // group A:
            {"John", "Doe", "teacher", "New York"},
            {"Donald", "Duck", "jinx", "Duckburg"},
            // 10 or more further entries
        },
        { // group B:
            {"Barack", "Obama", "president", "Washington"},
            // ...
        }
    };
String[][][] b = //for getting an entry by b[field][group][person]
    {
        { // prenames:
            {"John", "Donald", ...},
            {"Barack", ...}
        },
        { // surnames:
            {"Doe", "Duck", ...},
            {"Obama", ...}
        },
        { // job:
            {"teacher", "jinx", ...},
            {"president", ...}
        },
        { // city:
            {"New York", "Duckburg", ...},
            {"Washington", ...}
        }
    };

我猜第二个数组的性能更高,因为它总共包含较少的嵌套数组,而第一个数组有一个数组用于每人!将其转移到更大的阵列上...

感谢您的回答!

更新:

一个更好的(现实的)例子是一个包含 1000 个 x/y 坐标的数组:

int[][] coordsA =
    {
        {0, 0},
        {2, 7},
        {8, 2},
        {4, 2},
        {-3, 15},
        {1, 32},
        // ...
    };
int[][] coordsB =
    {
        {0, 2, 8, 4, -3, 1, ...}, // x values
        {0, 7, 2, 2, 15, 32, ...} // y values
    }

is there any performance difference between these two arrays in java (may be relevant for J2ME development...):

String[][][] a = //for getting an entry by a[group][person][field]
    {
        { // group A:
            {"John", "Doe", "teacher", "New York"},
            {"Donald", "Duck", "jinx", "Duckburg"},
            // 10 or more further entries
        },
        { // group B:
            {"Barack", "Obama", "president", "Washington"},
            // ...
        }
    };
String[][][] b = //for getting an entry by b[field][group][person]
    {
        { // prenames:
            {"John", "Donald", ...},
            {"Barack", ...}
        },
        { // surnames:
            {"Doe", "Duck", ...},
            {"Obama", ...}
        },
        { // job:
            {"teacher", "jinx", ...},
            {"president", ...}
        },
        { // city:
            {"New York", "Duckburg", ...},
            {"Washington", ...}
        }
    };

I would guess the second array is more performant because it consinsts of less nested arrays in total, while the first array has one array for each person! transfering this on bigger arrays...

Thanks for your answers!

UPDATE:

A better (realistic) example is an array of, let's say, 1000 x/y-coordinates:

int[][] coordsA =
    {
        {0, 0},
        {2, 7},
        {8, 2},
        {4, 2},
        {-3, 15},
        {1, 32},
        // ...
    };
int[][] coordsB =
    {
        {0, 2, 8, 4, -3, 1, ...}, // x values
        {0, 7, 2, 2, 15, 32, ...} // y values
    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

墨离汐 2024-12-21 11:24:50

几年前,当我使用 Fortran 时,我们被告知要在多维数组上排列循环,以便第一个维度由于内存的排列方式而成为迭代速度最快的维度,并且会导致更少的页面错误。

然而,从那时起,我了解到,在 Java 中(就像在几乎所有事物中一样),如果您遇到性能问题,您应该测量所有建议的解决方案,直到您找到一个不再有性能问题的解决方案。过早的优化是万恶之源等。

如果您认为数组的数量会成为问题,您是否测量过每个选项占用的内存量?您是否测量过特定操作所花费的时间?

如果您没有性能问题,请使用最适合您要求的表示形式。

Years ago, when I was using Fortran, we were told to arrange our loops on multi-dimensional arrays so that the first dimension was the one that iterated fastest because of the way that the memory was arranged, and it would cause fewer page faults.

However, since then, I've learnt that in Java (as in almost everything), if you have a performance problem, you should measure all of the proposed solutions until you find a solution in which you no longer have the performance issue. Premature optimisation is the root of all evil etc.

If you think that the number of arrays will be a problem, have you measured the amount of memory that each option takes? Have you measured the time taken for particular operations.

If you don't have a performance problem, then use the representation that fits your requirements the best.

宛菡 2024-12-21 11:24:50

<在此处插入通常的 knuth 引用等 - 但我假设这里的任何人都已经知道这一点>

Java 的问题是每个数组级别都被视为一个对象本身,即我们不知道我们的数组的连续内存区域(GC 可以在这方面提供一点帮助)

这有几个影响:

  • 更糟糕的缓存局部性
  • 对每个数组级别进行额外的边界检查
  • 访问一个元素所需的额外内存访问

不太好(即在最坏的情况下)具有一次内存访问一个元素,我们可能会获得 2*索引级别的内存访问),因此,如果您需要高效的数组,通常的解决方案是仅创建一个大数组并自己进行索引。这避免了所有这些问题,但代价是需要一些简单的辅助方法(这非常简单,特别是如果所有子数组都具有相同的大小)。

但是性能提升在很大程度上还取决于您的访问方案。如果你想比较你的文章中给出的两种不同的变体,它可能不会那么重要(也就是说,如果你正确地排序你的数组,你在 C 或 co 中获得的收益远不及你获得的收益)。不过,在其子数组中顺序访问它们仍应提供更好的缓存局部性。

<insert usual knuth quote, etc. here - but I assume anyone here knows that already anyhow>

The problem with Java is that each array level is regarded as one object itself, i.e. we don't get continuous memory regions for our array (well the GC could help a bit there)

This has several effects:

  • worse cache locality
  • additional bounds checks for each array level
  • additional memory accesses necessary to access one element

Not very nice that (i.e. in the worst case instead of having one memory access for one element, we may get 2*index level memory accesses), so the usual solution if you need efficient arrays is to create only one large array and do the indexing yourself. That avoids all these problems at the cost of needing some simple helper methods (which are pretty simple especially if all sub arrays have the same size).

But the performance gains also largely depend on your access scheme. If you want to compare the two different variants given in your post it probably won't matter that much (i.e. nowhere near the same gains you get in C or co if you order your arrays correctly). Accessing them sequentially in their sub arrays should still give better cache locality though.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文