无需泛型即可进行防白痴迭代的 API 设计
当您为代码库设计 API 时,您希望它易于使用,并且难以使用不好。理想情况下,您希望它是白痴证明。
您可能还希望使其与无法处理泛型的旧系统兼容,例如 .Net 1.1 和 Java 1.4。但您不希望在较新的代码中使用它变得很痛苦。
我想知道以类型安全的方式使事物轻松迭代的最佳方法...请记住,您不能使用泛型,因此 Java 的 Iterable
已经过时了,.Net 也是如此IEnumerable
。
您希望人们能够使用 Java 中增强的 for 循环(for Item i : items)
和 foreach
/ For Each
循环在 .Net 中,您不希望他们进行任何转换。基本上,您希望您的 API 现在友好且向后兼容。
我能想到的最好的类型安全选项是数组。它们完全向后兼容并且它们很容易以类型安全的方式进行迭代。但数组并不理想,因为你无法使它们不可变。因此,当您有一个包含数组的不可变对象时,您希望人们能够对其进行迭代,为了保持不变性,您必须在每次访问它时提供一个防御性副本。
在 Java 中,执行 (MyObject[]) myInternalArray.clone();
速度非常快。我确信 .Net 中的同等功能也超快。如果你有 like:
class Schedule {
private Appointment[] internalArray;
public Appointment[] appointments() {
return (Appointment[]) internalArray.clone();
}
}
人们可以做 like:
for (Appointment a : schedule.appointments()) {
a.doSomething();
}
并且它将变得简单、清晰、类型安全且快速。
但是他们可以做这样的事情:
for (int i = 0; i < schedule.appointments().length; i++) {
Appointment a = schedule.appointments()[i];
}
然后效率会非常低,因为每次迭代整个约会数组都会被克隆两次(一次用于长度测试,一次用于获取索引处的对象) )。如果数组很小,这不是问题,但如果数组中有数千个项目,则非常可怕。哎呀。
真的有人会这么做吗?我不确定......我想这很大程度上是我的问题。
您可以调用方法 toAppointmentArray()
而不是 appointments()
,这可能会降低任何人以错误方式使用它的可能性。但当人们只想迭代约会时,这也会让他们更难找到。
当然,您会清楚地记录 appointments()
,说它返回一个防御性副本。但很多人不会阅读特定的文档。
尽管我欢迎建议,但在我看来,没有完美的方法可以使其简单、清晰、类型安全、和防白痴。如果少数人在不知情的情况下克隆数组数千次,我是否失败了,或者对于大多数来说,为简单、类型安全的迭代付出的代价是否可以接受?
注意,我碰巧正在为 Java 和 .Net 设计这个库,这就是为什么我试图使这个问题适用于两者。我将其标记为与语言无关,因为这对于其他语言也可能出现。代码示例采用 Java 语言,但 C# 类似(尽管可以选择将 Appointments
访问器设置为属性)。
更新:我做了一些快速的性能测试,看看这对 Java 产生了多大的影响。我测试了:
- 克隆数组一次,并使用增强的 for 循环迭代
- 它使用 ArrayList 迭代 增强的 for 循环
- 迭代不可修改的 数组列表(来自 Collections.unmodifyableList)使用 增强的 for 循环
- 以不好的方式迭代数组(在长度检查中重复克隆它 以及获取每个索引项时)。
对于 10 个物体,相对速度(多次重复并取中值)如下:
- 1,000
- 1,300
- 1,300
- 5,000
对于 100 个物体:
- 1,300
- 4,900
- 6,300
- 85,500
对于 1000 个物体:
- 6,400
- 51,700
- 56,200
- 7,000,300
对于 10000 个对象:
- 68,000
- 445,000
- 651,000
- 655,180,000
当然是粗略数字,但足以让我相信两件事:
- 克隆,然后迭代绝对是 不是性能问题。实际上 它始终比使用更快 列表。 (这就是 为什么 Java enum.values() 方法返回一个 数组的防御性副本而不是 一个不可变的列表。)
- 如果您重复调用该方法, 不必要地重复克隆阵列, 所涉及的阵列越大,性能就越成为一个问题。这太可怕了。那里没有什么惊喜。
When you're designing the API for a code library, you want it to be easy to use well, and hard to use badly. Ideally you want it to be idiot proof.
You might also want to make it compatible with older systems that can't handle generics, like .Net 1.1 and Java 1.4. But you don't want it to be a pain to use from newer code.
I'm wondering about the best way to make things easily iterable in a type-safe way... Remembering that you can't use generics so Java's Iterable<T>
is out, as is .Net's IEnumerable<T>
.
You want people to be able to use the enhanced for loop in Java (for Item i : items)
, and the foreach
/ For Each
loop in .Net, and you don't want them to have to do any casting. Basically you want your API to be now-friendly as well as backwards compatible.
The best type-safe option that I can think of is arrays. They're fully backwards compatible and they're easy to iterate in a typesafe way. But arrays aren't ideal because you can't make them immutable. So, when you have an immutable object containing an array that you want people to be able to iterate over, to maintain immutability you have to provide a defensive copy each and every time they access it.
In Java, doing (MyObject[]) myInternalArray.clone();
is super-fast. I'm sure that the equivalent in .Net is super-fast too. If you have like:
class Schedule {
private Appointment[] internalArray;
public Appointment[] appointments() {
return (Appointment[]) internalArray.clone();
}
}
people can do like:
for (Appointment a : schedule.appointments()) {
a.doSomething();
}
and it will be simple, clear, type-safe, and fast.
But they could do something like:
for (int i = 0; i < schedule.appointments().length; i++) {
Appointment a = schedule.appointments()[i];
}
And then it would be horribly inefficient because the entire array of appointments would get cloned twice for every iteration (once for the length test, and once to get the object at the index). Not such a problem if the array is small, but pretty horrible if the array has thousands of items in it. Yuk.
Would anyone actually do that? I'm not sure... I guess that's largely my question here.
You could call the method toAppointmentArray()
instead of appointments()
, and that would probably make it less likely that anyone would use it the wrong way. But it would also make it harder for people to find when they just want to iterate over the appointments.
You would, of course, document appointments()
clearly, to say that it returns a defensive copy. But a lot of people won't read that particular bit of documentation.
Although I'd welcome suggestions, it seems to me that there's no perfect way to make it simple, clear, type-safe, and idiot proof. Have I failed if a minority of people are unwitting cloning arrays thousands of times, or is that an acceptable price to pay for simple, type-safe iteration for the majority?
NB I happen to be designing this library for both Java and .Net, which is why I've tried to make this question applicable to both. And I tagged it language-agnostic because it's an issue that could arise for other languages too. The code samples are in Java, but C# would be similar (albeit with the option of making the Appointments
accessor a property).
UPDATE: I did a few quick performance tests to see how much difference this made in Java. I tested:
- cloning the array once, and iterating over it using the enhanced for loop
- iterating over an ArrayList using
the enhanced for loop - iterating over an unmodifyable
ArrayList (from
Collections.unmodifyableList) using
the enhanced for loop - iterating over the array the bad way (cloning it repeatedly in the length check
and when getting each indexed item).
For 10 objects, the relative speeds (doing multiple repeats and taking the median) were like:
- 1,000
- 1,300
- 1,300
- 5,000
For 100 objects:
- 1,300
- 4,900
- 6,300
- 85,500
For 1000 objects:
- 6,400
- 51,700
- 56,200
- 7,000,300
For 10000 objects:
- 68,000
- 445,000
- 651,000
- 655,180,000
Rough figures for sure, but enough to convince me of two things:
- Cloning, then iterating is definitely
not a performance issue. In fact
it's consistently faster than using a
List. (this is why Java's
enum.values() method returns a
defensive copy of an array instead of
an immutable list.) - If you repeatedly call the method,
repeatedly cloning the array unnecessarily,
performance becomes more and more of an issue the larger the arrays in question. It's pretty horrible. No surprises there.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
clone() 很快,但不是我所说的超级快。
如果您不相信人们能够有效地编写循环,我不会让他们编写循环(这也避免了对clone()的需要)
clone() is fast but not what I would describe as super faster.
If you don't trust people to write loops efficiently, I would not let them write a loop (which also avoids the need for a clone())
由于您无法真正两者兼得,因此我建议您创建 API 的预泛型和泛型版本。理想情况下,底层实现可以基本相同,但事实是,如果您希望使用 Java 1.5 或更高版本的任何人都可以轻松使用它,他们会期望使用 Generics 和 Iterable 以及所有较新的语言功能。
我认为数组的使用应该是不存在的。在这两种情况下,它都无法提供易于使用的 API。
注意:我从未使用过 C#,但我希望同样如此。
Since you can't really have it both ways, I would suggest that you create a pre generics and a generics version of your API. Ideally, the underlying implementation can be mostly the same, but the fact is, if you want it to be easy to use for anyone using Java 1.5 or later, they will expect the usage of Generics and Iterable and all the newer languange features.
I think the usage of arrays should be non-existent. It does not make for an easy to use API in either case.
NOTE: I have never used C#, but I would expect the same holds true.
就少数用户而言,无论 API 设计如何,那些在循环的每次迭代中调用相同方法来获取相同对象的用户都会导致效率低下。我认为只要有充分的记录,要求用户遵守一些看似常识的要求并不过分。
As far as failing a minority of the users, those that would call the same method to get the same object on each iteration of the loop would be asking for inefficiency regardless of API design. I think as long as that's well documented, it's not too much to ask that the users obey some semblance of common sense.