Java:获取对象的唯一属性(如哈希码,但防碰撞)
我有一个任务,需要为集合中的每个对象生成唯一值。如果哈希码合约中不允许冲突,那么使用哈希码将是完美的。
一个想法:将每个对象的哈希码记录到一个多重集中。然后,使用哈希码作为唯一标识符,但如果该哈希码在集合中多次出现,则使用也不在集合中的不同值。但这感觉笨重且尴尬。
更好的想法?
这是我已经拥有的:
public static <V> void toGraphViz(final Graph<V, DefaultWeightedEdge> g, String filename) {
// to avoid hashcode collisions
final Set<Integer> hashcodes = new HashSet<Integer>(g.vertexSet().size());
DOTExporter<V, DefaultWeightedEdge> dot = new DOTExporter<V, DefaultWeightedEdge>(new VertexNameProvider<V> () {
// vertex name must be unqiue
@Override
public String getVertexName(V arg0) {
int hash = arg0.hashCode();
while (hashcodes.contains((hash))) {
hash += 1;
}
return "" + hash;
}
}
编辑:我想这最初并不清楚,但 id 号确实需要是对象的函数,因为 getVertexName(V)
将被调用多次,并且它期望对于相同的 V
值,它将得到相同的结果。
此外,顶点类型是通用的。所以我无法对特定类进行任何修改来解决这个问题。
I have a task for which it is necessary to generate a unique value for every object in a set. using the hashcode would be perfect, if collisions weren't allowed in the hashcode contract.
One idea: Record every object's hashcode into a multiset. Then, use hashcodes as the unique identifier, but if that hashcode is in the set more than once, use a different value that is also not in the set. But this feels bulky and awkward.
Better ideas?
Here's what I have already:
public static <V> void toGraphViz(final Graph<V, DefaultWeightedEdge> g, String filename) {
// to avoid hashcode collisions
final Set<Integer> hashcodes = new HashSet<Integer>(g.vertexSet().size());
DOTExporter<V, DefaultWeightedEdge> dot = new DOTExporter<V, DefaultWeightedEdge>(new VertexNameProvider<V> () {
// vertex name must be unqiue
@Override
public String getVertexName(V arg0) {
int hash = arg0.hashCode();
while (hashcodes.contains((hash))) {
hash += 1;
}
return "" + hash;
}
}
EDIT: I guess this wasn't originally clear, but the id number does somehow need to be a function of the object, because getVertexName(V)
will get called several times, and it expects that for the same values of V
, it will get the same results.
Also, the Vertex type is generic. So I can't make any modifications to a specific class to fix this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
这个唯一号码的生命周期是多长?只是程序的生命周期吗?在这种情况下,为什么不在类中使用一个简单的静态计数器,通过适当的同步进行访问?为每个新对象增加它。无需保留您使用过的值的列表,只需保留您使用过的最高值即可。
如果在许多执行中(可能还有许多同时实例)是唯一的,那么也许您可以只使用生成唯一记录 ID 的数据库。
编辑以回应澄清
我之前错过的那件事是我们无法修改我们想要为其生成唯一“哈希”的类。
我认为根据类的哈希码进行工作会产生冲突,这让生活变得很困难。假设我们可以依赖正确实现 equals() 的 Vertex 类,那么我们可以使用对象本身作为我们使用的哈希码集的键。
What is the lifetime of this unique number? Just the lifetime of the program? In which case why not just a simple static counter in the class, accessed with suitable synchronisation? Increment it for each new object. No need to keep a list of the values you have used, just the highest value you have used.
If unique across many executions (and maybe many simultaneous instances) then perhaps you can just use a Database which generates unqiue record ids.
EDITED in response to clarification
The piece I missed before was that we can't modify the class for which we want to generate the unique "hash".
I think that working from the hash code of the class, which will have collisions is making life hard. Assuming that we can rely upon the Vertex classes in question having correctly implemented equals() then we can use the object itself as a key to the set of hashcodes we have used.
您可以考虑使用 UUID ,取决于你想要完成什么......
You could consider using a UUID, depending on what you are trying to accomplish...
要找到对象的唯一值,您必须了解使该对象唯一的属性组合。
要运行“.contains()”,您需要有一个确定“.equals()”的方法,这意味着您应该已经知道如何唯一标识一个顶点,因此也许您可以想出唯一属性的表达式?
例如,“(x, y, z, rgb)”
除非我误解了这个问题,否则我不建议为此目的而使用对象的 hashCode 进行处理。
To find a unique value for an object, you have to know a combination of properties that make the object unique.
To run ".contains()", you need to have a method of determining ".equals()", which means you should already know how to uniquely identify a Vertex, so maybe you can come up with an expression of the unique properties?
e.g., "(x, y, z, rgb)"
Unless I'm misunderstanding the question, I wouldn't recommend mucking with an object's hashCode for this purpose.
为什么不直接使用序列号?
或者是组合/混合,比如很长的 ((hash<<32) | getNextSerial())。
要解决编辑澄清问题,
当您构造对象时,将序列号分配给私有成员变量并将其返回给 hashCode()。然后,您应该通过调用 super.equals() 来覆盖 equals(因为生成的序列号与默认的 equals() 实现一致),因为看到 hashCode() 覆盖而没有相应的 equals() 覆盖将会对代码进行红旗标记工具(和其他程序员)。
Why not just use a serial number?
Or a combination/hybrid, say a long of ((hash<<32) | getNextSerial()).
To address the EDIT clarification
When you construct the object, allocate the serial number to a private member variable and return it for hashCode(). You should then override equals with a call to super.equals() (since a generated serial number is consistent with the default equals() implementation) because seeing a hashCode() override without a corresponding equals() override will red-flag the code to tools (and other programmers).
我认为您误解了哈希码。
根据合约,当 equals(..) 为 true 时,hascode 应该相同,反之亦然。因此,在您的情况下,只有具有相同属性的顶点才应该具有相同的 hascode,否则您自己编写的 hascode 计算方法应该被修复。据我理解你的问题,顶点本身是唯一的,所以你不应该有问题,对吧?
I think you misunderstood hashcode.
Based on the contract the hascode should be the same when equals(..) is true and vice versa. So in your case only a vertex with the same properties should have the same hascode, otherwise your self written hascode calculation method should be fixed. As far as I have understood your question a vertex for itself is unique, so you shouldn't have a problem, right?
我可能不明白你在做什么,但考虑创建一个参考
到每个对象。由于引用包含对象的地址,因此
每个对象都是唯一的。
I probably don't understand what you are doing, but consider creating a reference
to each object. Since the reference contains the address of the object it will be
unique for each object.
这并不难,不是吗?如果 Java 中的哈希算法不能保证不发生冲突,则只需使用不同的哈希算法即可。将对象发送到哈希算法(例如 Sha-256),并将其用作密钥。如果您需要使用不同的哈希值保留完全相同的对象的不同副本,请在执行哈希时使用种子,并将其与哈希值存储在与对象相关的位置。
It's not that difficult, is it? Just use a different hash algorithm, if the one in Java doesn't guarantee no collisions. Send the object to the hash algorithm, e.g. Sha-256, and use that as the key. If you need to keep different copies of exact same object, with different hash values, use a seed when you perform the hash, and store this related to the object with the hash.