用于唯一标识节点的最佳集合是什么?
目前我正在使用 Dictionary
来存储大约 10,000 个节点。键用作稍后查找的 ID 号,“节点”是包含一些数据的类。程序中的其他类使用 ID 号作为指向节点的指针。 (这听起来可能效率低下。但是,解释我使用字典的原因超出了我的问题范围。)
但是,20% 的节点是重复的。 我想要做的是当我添加一个节点时检查它是否全部准备好存在。如果确实如此,则使用该 ID 号。如果没有创建一个新的。
这是我当前对该问题的解决方案:
public class nodeDictionary
{
Dictionary<int, node> dict = new Dictionary<int, node>( );
public int addNewNode( latLng ll )
{
node n = new node( ll );
if ( dict.ContainsValue( n ) )
{
foreach ( KeyValuePair<int, node> kv in dict )
{
if ( kv.Value == n )
{
return kv.Key;
}
}
}
else
{
if ( dict.Count != 0 )
{
dict.Add( dict.Last( ).Key + 1, n );
return dict.Last( ).Key + 1;
}
else
{
dict.Add( 0, n );
return 0;
}
}
throw new Exception( );
}//end add new node
}
问题是当尝试将新节点添加到 100,000 个节点的列表中时,添加该节点需要 78 毫秒。这是不可接受的,因为我可能在任何给定时间添加额外的 1,000 个节点。
那么,有更好的方法吗?我不是在寻找有人为我编写代码,我只是在寻找指导。
Currently I am using a Dictionary<int,node>
to store around 10,000 nodes. The key is used as an ID number for later look up and the 'node' is a class that contains some data. Other classes within the program use the ID number as a pointer to the node. (this may sound inefficient. However, explaining my reasoning for using a dictionary for this is beyond the scope of my question.)
However, 20% of the nodes are duplicate.
What i want to do is when i add a node check to see if it all ready exists. if it does then use that ID number. If not create a new one.
This is my current solution to the problem:
public class nodeDictionary
{
Dictionary<int, node> dict = new Dictionary<int, node>( );
public int addNewNode( latLng ll )
{
node n = new node( ll );
if ( dict.ContainsValue( n ) )
{
foreach ( KeyValuePair<int, node> kv in dict )
{
if ( kv.Value == n )
{
return kv.Key;
}
}
}
else
{
if ( dict.Count != 0 )
{
dict.Add( dict.Last( ).Key + 1, n );
return dict.Last( ).Key + 1;
}
else
{
dict.Add( 0, n );
return 0;
}
}
throw new Exception( );
}//end add new node
}
The problem with this is when trying to add a new node to a list of 100,000 nodes it takes 78 milliseconds to add the node. This is unacceptable because i could be adding an additional 1,000 nodes at any given time.
So, is there a better way do do this? I am not looking for someone to write the code for me, I am just looking for guidance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
听起来您想
HashSet
中实现 GetHashCode,请参见此处:为什么是当覆盖 Equals 方法时覆盖 GetHashCode 很重要吗?
如果您需要以某种方式生成“人工”唯一 ID,我建议您使用再次使用字典方法,但“相反”:
您可以使用 idMap 替换 HashSet;实现(和性能特征)本质上是相同的,但作为关联容器。
这是从 LatLng 到 Id 的查找函数:
您可以及时添加它:
It sounds like you want to
IEquatable<LatLng>
interface)HashSet<LatLng>
For implementing GetHashCode, see here: Why is it important to override GetHashCode when Equals method is overridden?
If you need to generate 'artificial' unique IDs in some fashion, I suggest you use the dictionary approach again, but 'in reverse':
You can have the
idMap
replace theHashSet<>
; the implementation (and performance characteristics) is essentially the same but as an associative container.Here's a lookup function to get from LatLng to Id:
You could just-in-time add it:
我会为相反方向添加第二个字典。 ie
Dictionary
那么你要么
IEqualityComparer
并将其提供给Node
上的字典 OverrideEquals
和GetHashCode
在这两种情况下哈希码的良好实现对于获得良好的性能至关重要。
I'd add a second dictionary for the reverse direction. i.e.
Dictionary<Node,int>
Then you either
IEqualityComparer<Node>
and supply it to the dictionaryEquals
andGetHashCode
onNode
In both cases a good implementation for the hashcode is essential to get good performance.
你的解决方案不仅慢,而且是错误的。
Dictionary
中项目的顺序未定义,因此dict.Last()
不能保证返回最后添加的项目。 (尽管通常看起来是这样。)使用 id 来标识应用程序中的对象似乎也是错误的。您应该考虑直接使用对该对象的引用。
但是,如果您想使用当前的设计并假设您根据
latLng
比较节点,则可以创建两个字典:您已有的一个和第二个字典,Dictionary
,可以用来有效地找出某个节点是否已经存在。如果是的话,它会给你它的 id。Your solution is not only slow, but also wrong. The order of items in a
Dictionary
is undefined, sodict.Last()
is not guaranteed to return the item that was added last. (Although it may often look that way.)Using an id to identify an object in your application seems wrong too. You should consider using references to the object directly.
But if you want to use your current design and assuming that you compare nodes based on their
latLng
, you could create two dictionaries: the one you already have and a second one,Dictionary<latLng, int>
, that can be used to efficiently fond out whether a certain node already exists. And if it does, it gives you its id.这段代码的目的究竟是什么?
ContainsValue
搜索值(而不是键)并且效率非常低 (O(n))。foreach
也是如此。更不用说当只需要一个时您就同时执行这两项操作(您可以通过稍微重新安排您的if
来完全删除ContainsValue
)!您可能应该维护与原始字典“相反”的附加字典(即旧字典中的值是新字典中的键,反之亦然),以“覆盖”您的搜索模式(类似于数据库如何维护表中的多个索引)以涵盖查询表的多种方式)。
What exactly is the purpose of this code?
The
ContainsValue
searches for a value (instead of a key) and is very inefficient (O(n)). Ditto forforeach
. Let alone you do both when only one is necessary (you could completely removeContainsValue
by rearranging yourif
s a little)!You should probably mainntain additional dictionary that is "reverse" of the original one (i.e. values in old dictionary are keys in the new one and vice versa), to "cover" your search patterns (similarly to how databases can maintain multiple indexes par table to cover multiple ways table can be queried).
您可以尝试使用
HashSet
You could try using
HashSet<T>
您可能需要考虑将其重组为仅使用列表(其中“键”只是列表中的索引)而不是字典。一些优点:
通过整数键查找元素现在是 O(1)(并且非常快的 O(1),因为它只是内部的数组取消引用)。
当插入一个新元素时,您将执行 O(n) 搜索来查看它是否已存在于列表中。如果没有,您也已经遍历了该列表,并且可以记录您是否遇到了具有空记录的条目。如果有,该索引就是新键。如果不是,则新键是当前列表计数。您只需枚举一次集合而不是多次,并且枚举本身比枚举字典要快得多。
You might want to consider restructuring this to just use a List (where the 'key' is just the index into the List) instead of a Dictionary. A few advantages:
Looking up an element by integer key is now O(1) (and a very fast O(1) given that it's just an array dereference internally).
When you insert a new element, you perform an O(n) search to see whether it already exists in the list. If it does not, you've also already traversed the list and can have recorded whether you encountered an entry with a null record. If you have, that index is the new key. If not, the new key is the current list Count. You're enumerating the collection once instead of multiple times and the enumeration itself is much faster than enumerating a Dictionary.