假设我们有一堆 Car 对象。
每辆汽车都有一些独特的属性,例如制造商、型号、年份等(这些可用于创建不同的哈希代码)。
每辆车都有一个 PurchaseOffer 对象列表(PurchaseOffer 对象包含定价\零售商信息)。
我们从多个不同来源收到汽车列表,每辆车都有一个购买报价。
事实是,这些列表可能会重叠 - 一辆汽车可以出现在多个列表中。
我们希望将列表聚合到单个汽车集合中,其中每辆车都包含其遇到的所有购买报价。
我的问题是在这个聚合过程中选择要使用的集合:
使用 java.util.HashSet 来保存我们的汽车感觉很自然,这样当检查不同的汽车列表时,我们可以检查是否集合中已经存在一辆汽车,分摊时间为 O(1),
但是 - 您无法从 Set 中检索元素(在我们的例子中 - 当我们遇到 Set 中已存在的汽车时 - 我们希望根据其标识 hashCode 从 Set 中检索该汽车并向其添加 PurchaseOffers ) 。
我可以使用 HashMap,其中每个 Car 的 hashCode 映射到实际的 Car 对象,但它可能不是教科书解决方案,因为它不安全 - 我必须确保自己每个 hashCode 都映射到具有该 hashCode 的 Car -可能会出现不一致的情况。
当然,可以制定一个指定的数据结构来保证这种一致性——难道不应该已经存在吗?
任何人都可以建议我所追求的数据结构,或者指出设计错误吗?
谢谢。
Let's say we have a bunch of Car objects.
Each Car has some distinguishing properties e.g. manufacturer, model, year, etc. (these can be used to create distinct hashCodes).
Each car has a List of PurchaseOffer objects (a PurchaseOffer object contains pricing\retailer info).
We receive Lists of Cars from several different sources, each Car with a single PurchaseOffer.
Thing is, these lists may overlap - a Car can appear in more than one list.
We wish to aggregate the lists into a single collection of Cars where each Car holds all encountered PurchaseOffers for it.
My Problem is choosing what to collection to use in this aggregation process:
Feels natural to use java.util.HashSet for holding our cars, that way when going over the different lists of Cars, we can check if a car already exists in the Set in amortized O(1),
however - you cannot retrieve an element from a Set (in our case - when we go encounter a Car that already exists in the Set - we would have liked to retrieve that Car from the Set based on its identifying hashCode and add PurchaseOffers to it).
I can use a HashMap where each Car's hashCode maps to the actual Car object, but it probably isn't the school-book solution since it is unsafe - I would have to make sure myself that every hashCode maps to a Car with that hashCode - there could be inconsistency.
Of course, can make a designated data structure that guarantees this consistency - Shouldn't one already exist ?
Can anyone suggest the data-structure I am after, or point out a design mistake ?
Thanks.
发布评论
评论(9)
由于这是多对多关系,因此您需要双向多映射。 Car 是第一个的键,以 PurchaseOrder 列表作为值。 PurchaseOrder 是第二个订单的键,其值为汽车列表。
底层实现是两个HashMap。
在其之上放置一个 API 以获得您需要的行为。或者查看 Google 收藏集是否可以为您提供帮助。它是一个 BiMap 和两个 MultiMap 的组合。
Since this is a many-to-many relationship, you need a bi-directional multi-map. Car is the key for the first one, with a List of PurchaseOrder as the value. The PurchaseOrder is the key for the second one, with a List of Cars as the value.
The underlying implementation is two HashMaps.
Put an API on top of it to get the behavior you need. Or see if Google Collections can help you. It's a combination of a BiMap and two MultiMaps.
我认为你确实需要(至少)一个
HashMap>
...正如 @Andreas_D 所建议的,你反对每个
Car
已经有一个List
不是重点。HashMap
中的列表是聚合列表,包含所有Car
对象中的所有PurchaseOffer
对象,这些对象代表同一辆实体车。创建新列表的目的是避免更改原始
Car
对象上的原始列表。 (如果这不是问题,那么您可以从代表一辆实体汽车的集合中选择一个Car
实例,并将其他对象中的PurchaseOffer
对象合并到该列表中。)我不完全确定为什么 @duffymo 建议在两者之间建立双向映射,但我认为这是因为来自不同来源的不同
Car
对象可能具有相同的互补(或矛盾)信息实体车。通过保留所有实例,您可以避免丢弃信息。 (再一次,如果您愿意放弃变异和/或放弃信息,您可以尝试将每辆汽车的信息合并到一个Car
对象中。如果您真的不关心保留信息并准备随意合并内容,那么以下方法可能会起作用:
您不应该尝试使用
Car.hashCode()
作为任何内容的键。哈希码值不是唯一标识符。 :很可能两辆不同的汽车最终会具有相同的哈希码值,如果您尝试将它们当作唯一标识符来使用,您会遇到麻烦......I think that you really do need (at least) a
HashMap<Car, List<PurchaseOffer>>
... as suggested by @Andreas_DYour objection that each
Car
already has aList<PurchaseOffer>
is beside the point. The list in theHashMap
is the aggregate list, containing allPurchaseOffer
objects from allCar
objects that stand for the same physical car.The point of creating a new list is to avoid changing the original lists on the original
Car
objects. (If that was not a concern, then you could pick one instance ofCar
from the set that represent a physical car, and merge thePurchaseOffer
objects from the others into that list.)I'm not entirely sure why @duffymo suggested a bi-directional map between, but I think it is because the different
Car
objects from different sources may have complementary (or contradictory) information for the same physical car. By keeping all instances, you avoid discarding information. (Once again, if you are happy to discard mutate and/or discard information, you could attempt to merge the information about each individual car into a singleCar
object.If you really didn't care about preserving information and were prepared to merge stuff willy-nilly then the following approach would probably work:
You should NOT be trying to use the
Car.hashCode()
as a key for anything. Hashcode values are not unique identifiers: there is a distinct possibility that two different cars will end up with the same hashcode value. If you attempt to use them as if they were unique identifiers you'll get into trouble ...基本数据结构应该是
HashMap>
。这允许存储和接收一辆选定汽车的所有报价。现在您可能必须为
Car.equals()
找到合适的实现,以确保来自不同来源的“汽车”确实相同。将equals()
基于真实世界汽车的唯一标识符 (VIN) 怎么样?The basic datastructure should be a
HashMap<Car, List<PurchaseOffer>>
. This allows for storing and receiving all offers for one selected car.Now you may have to find a suitable implementation for
Car.equals()
to assure, that "cars" coming from different source are really the same. What about basingequals()
on a unique identifier for a real world car (VIN)?我更喜欢使用
HashMap>
,如之前建议的(Andreas、Stephen),主要是在 Car 对象不保存列表的情况下购买优惠。否则,我会考虑使用
HashMap
,或者更好的 IMO,如果每辆车都有唯一的 ID,则使用HashMap
。正如问题中提到的,它不能简单地将汽车的 hashCode 映射到汽车,因为不同的汽车可以具有相同的 hashCode!
(无论如何,我会创建一个自己的类来存储和管理汽车。这将包含 HashMap 或任何一个 - 因此很容易更改实现而不需要更改其接口)
I would prefer to use a
HashMap<Car, List<PurchaseOffer>>
, as suggested before (Andreas, Stephen), mainly if the Car object does not hold the list of PurchaseOffers.Otherwise I would consider using a
HashMap<Car, Car>
or, better IMO, aHashMap<ID, Car>
if there is an unique ID for each Car.It can not simply map the Car's hashCode to the Car, as mentioned in the question, since distinct Cars can have the same hashCode!
(Anyway, I would create an own class for storing and managing the Cars. This would contain the HashMap, or whichever - so it's easy to change the implementation without needing to change its interface)
创建扩展哈希的自定义类
Set,
重写方法包含(Object o)
检查操作系统哈希码是否相同并根据返回结果,并将对象添加到集合中,并且仅当它不包含该对象时
create tout custom class that extends hash
Set,
override method contains(Object o)
check there os hash code is same or not and return result according, and add object to set of and only if it not containing that object
定义一个新的自定义聚合类怎么样?定义哈希码,使汽车的 id 作为键并相应地覆盖 equals()。定义一个自定义方法来接受您的原始汽车并在列表上执行联合操作。最后将自定义对象存储在 HashSet 中以实现恒定时间查找。
用纯粹的术语来说,聚合是一种超出单个对象范围的行为。访问者模式试图解决类似的问题。
或者,如果您有一个 sql 数据存储,则使用 group by 进行简单的选择即可解决问题。
How about a defining a new custom Aggregation class? Define the hashcode such that the id of the car acts as the key and override the equals() accordingly. Define a custom method for accepting your original car and do a union operation on the lists. Finally store the custom objects in a HashSet for achieving constant time look up.
In purist terms, aggregation is a behavior beyond the scope of a single object. Visitor pattern tries to address a similar problem.
Alternatively if you have a sql datastore, a simple select using group by would do the trick.
在本例中,将 Car 类分为两个类 - 具有标识属性的 CarType 类,然后是列表部分(可能两者一起由
Car
使用)。然后使用Map
作为您的数据结构(或MultiMap
)。In this case split the Car class in two classes - the CarType class with the identifying attributes, and then the list part (maybe both together used by
Car
). Then useMap<CarType, Lost<PurchaseOffer>
for your datastructure (orMultiMap<CarType, PurchaseOffer>
).替代方案 1 将利用
Car
和Offer
的hashCode()
替代方案 2 将利用
Integer
的hashCode()
。这将减轻您对“安全”的担忧,因为Integer
对象的哈希码不应在值唯一的情况下重叠。这会产生额外的开销,因为必须为每个Car
和Offer
对象维护唯一的 ID,但是,我猜测您可能已经根据业务需求拥有了这些 ID。请注意,您可以选择使用其他类来替代 ID 的
int
(例如String
)。对于这两种替代方案,请使用
ArrayList
或LinkedList
实现List
- 哪一个更好由您根据其他要求来确定,例如作为插入/删除频率与查找频率。使用HashMap
实现Map
- 请参阅上面有关如何使用哈希码的注释。作为旁注,在我们的软件中,我们使用上述两者来表示类似类型的多对多数据。与您的用例非常相似。
两种选择都效果很好。
Alternative 1 would make use of the
hashCode()
ofCar
andOffer
Alternative 2 would make use of the
hashCode()
ofInteger
. This would allay your concerns about "safety" as the hash codes forInteger
objects should not overlap where the values are unique. This incurs the additional overhead of having to maintain unique IDs for eachCar
andOffer
object, however, I am guessing that you probably already have those from your business requirements.Note, you may choose to use other classes as alternative to
int
s for ID's (e.g.String
).For both alternatives, implement the
List
s withArrayList
orLinkedList
- which one is better is up to you to determine based on other requirements, such as the frequency of insertion/deletion vs lookup. Implement theMap
s withHashMap
- see comments above about how hash codes are used.As a side note, in our software, we use these both of the above to represent similar types of many-to-many data. Very similar to your use case.
Both alternatives work very well.
为什么不使用对象数据库呢?您可以存储您想要的任何对象图,并且您将获得一个搜索 API,您可以使用它执行您想要的任何关系/检索机制。一个简单的集合可以工作,但听起来您想要比集合提供的更复杂的关系。查看 db4o (http://db4o.com) - 它对于此类事情非常强大。
Why not use an object database for this? You could store any object graph you wanted, and you'd get a search API with which you could do any relationship/retrieval mechanism you wanted. A simple collection could work, but it sounds like you want a more complex relationship than a collection would provide. Look into db4o (http://db4o.com) - it's very powerful for this sort of thing.