用于唯一标识节点的最佳集合是什么?

发布于 2024-12-13 06:59:06 字数 1234 浏览 0 评论 0原文

目前我正在使用 Dictionary 来存储大约 10,000 个节点。键用作稍后查找的 ID 号,“节点”是包含一些数据的类。程序中的其他类使用 ID 号作为指向节点的指针。 (这听起来可能效率低下。但是,解释我使用字典的原因超出了我的问题范围。)

但是,20% 的节点是重复的。 我想要做的是当我添加一个节点时检查它是否全部准备好存在。如果确实如此,则使用该 ID 号。如果没有创建一个新的。

这是我当前对该问题的解决方案:

public class nodeDictionary 
{

    Dictionary<int, node> dict = new Dictionary<int, node>( );
    public int addNewNode( latLng ll )
    {
        node n = new node( ll );
        if ( dict.ContainsValue( n ) )
        {
            foreach ( KeyValuePair<int, node> kv in dict )
            {
                if ( kv.Value == n )
                {
                    return kv.Key;
                }
            }
        }
        else
        {
            if ( dict.Count != 0 )
            {
                dict.Add( dict.Last( ).Key + 1, n );
                return dict.Last( ).Key + 1;
            }
            else
            {
                dict.Add( 0, n );
                return 0;
            }
        }
        throw new Exception( );
    }//end add new node
}

问题是当尝试将新节点添加到 100,000 个节点的列表中时,添加该节点需要 78 毫秒。这是不可接受的,因为我可能在任何给定时间添加额外的 1,000 个节点。

那么,有更好的方法吗?我不是在寻找有人为我编写代码,我只是在寻找指导。

Currently I am using a Dictionary<int,node> to store around 10,000 nodes. The key is used as an ID number for later look up and the 'node' is a class that contains some data. Other classes within the program use the ID number as a pointer to the node. (this may sound inefficient. However, explaining my reasoning for using a dictionary for this is beyond the scope of my question.)

However, 20% of the nodes are duplicate.
What i want to do is when i add a node check to see if it all ready exists. if it does then use that ID number. If not create a new one.

This is my current solution to the problem:

public class nodeDictionary 
{

    Dictionary<int, node> dict = new Dictionary<int, node>( );
    public int addNewNode( latLng ll )
    {
        node n = new node( ll );
        if ( dict.ContainsValue( n ) )
        {
            foreach ( KeyValuePair<int, node> kv in dict )
            {
                if ( kv.Value == n )
                {
                    return kv.Key;
                }
            }
        }
        else
        {
            if ( dict.Count != 0 )
            {
                dict.Add( dict.Last( ).Key + 1, n );
                return dict.Last( ).Key + 1;
            }
            else
            {
                dict.Add( 0, n );
                return 0;
            }
        }
        throw new Exception( );
    }//end add new node
}

The problem with this is when trying to add a new node to a list of 100,000 nodes it takes 78 milliseconds to add the node. This is unacceptable because i could be adding an additional 1,000 nodes at any given time.

So, is there a better way do do this? I am not looking for someone to write the code for me, I am just looking for guidance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

屋顶上的小猫咪 2024-12-20 06:59:06

听起来您想

  • 确保 LatLng 覆盖 Equals/GetHashCode (最好实现 IEquatable接口)
  • 将所有项目直接填充到 HashSet

中实现 GetHashCode,请参见此处:为什么是当覆盖 Equals 方法时覆盖 GetHashCode 很重要吗?

如果您需要以某种方式生成“人工”唯一 ID,我建议您使用再次使用字典方法,但“相反”:

// uses the same hash function for speedy lookup/insertion
IDictionary<LatLng, int> idMap = new Dictionary<LatLng, int>(); 

foreach (LatLng latLng in LatLngCoords)
{
    if (!idMap.ContainsKey(latLng))
        idMap.Add(latLng, idMap.Count+1); // to start with 1
}

您可以使用 idMap 替换 HashSet;实现(和性能特征)本质上是相同的,但作为关联容器。

这是从 LatLng 到 Id 的查找函数:

int IdLookup(LatLng latLng)
{
     int id;
     if (idMap.TryGetValue(latLng, id))
         return id;
     throw new InvalidArgumentException("Coordinate not in idMap");
}

您可以及时添加它:

int IdFor(LatLng latLng)
{
     int id;
     if (idMap.TryGetValue(latLng, id))
         return id;

     id = idMap.Count+1;
     idMap.Add(latLng, id);
     return id;
}

It sounds like you want to

  • make sure that LatLng overrides Equals/GetHashCode (preferrably implement the IEquatable<LatLng> interface)
  • stuff all the items directly into a HashSet<LatLng>

For implementing GetHashCode, see here: Why is it important to override GetHashCode when Equals method is overridden?

If you need to generate 'artificial' unique IDs in some fashion, I suggest you use the dictionary approach again, but 'in reverse':

// uses the same hash function for speedy lookup/insertion
IDictionary<LatLng, int> idMap = new Dictionary<LatLng, int>(); 

foreach (LatLng latLng in LatLngCoords)
{
    if (!idMap.ContainsKey(latLng))
        idMap.Add(latLng, idMap.Count+1); // to start with 1
}

You can have the idMap replace the HashSet<>; the implementation (and performance characteristics) is essentially the same but as an associative container.

Here's a lookup function to get from LatLng to Id:

int IdLookup(LatLng latLng)
{
     int id;
     if (idMap.TryGetValue(latLng, id))
         return id;
     throw new InvalidArgumentException("Coordinate not in idMap");
}

You could just-in-time add it:

int IdFor(LatLng latLng)
{
     int id;
     if (idMap.TryGetValue(latLng, id))
         return id;

     id = idMap.Count+1;
     idMap.Add(latLng, id);
     return id;
}
久夏青 2024-12-20 06:59:06

我会为相反方向添加第二个字典。 ie Dictionary

那么你要么

  • 满足于引用相等并且什么也不做。
  • 创建一个 IEqualityComparer 并将其提供给
  • Node 上的字典 Override EqualsGetHashCode

在这两种情况下哈希码的良好实现对于获得良好的性能至关重要。

I'd add a second dictionary for the reverse direction. i.e. Dictionary<Node,int>

Then you either

  • Are content with reference equality and do nothing.
  • Create an IEqualityComparer<Node> and supply it to the dictionary
  • Override Equals and GetHashCode on Node

In both cases a good implementation for the hashcode is essential to get good performance.

故事还在继续 2024-12-20 06:59:06

你的解决方案不仅慢,而且是错误的。 Dictionary 中项目的顺序未定义,因此 dict.Last() 不能保证返回最后添加的项目。 (尽管通常看起来是这样。)

使用 id 来标识应用程序中的对象似乎也是错误的。您应该考虑直接使用对该对象的引用。

但是,如果您想使用当前的设计并假设您根据 latLng 比较节点,则可以创建两个字典:您已有的一个和第二个字典,Dictionary,可以用来有效地找出某个节点是否已经存在。如果是的话,它会给你它的 id。

Your solution is not only slow, but also wrong. The order of items in a Dictionary is undefined, so dict.Last() is not guaranteed to return the item that was added last. (Although it may often look that way.)

Using an id to identify an object in your application seems wrong too. You should consider using references to the object directly.

But if you want to use your current design and assuming that you compare nodes based on their latLng, you could create two dictionaries: the one you already have and a second one, Dictionary<latLng, int>, that can be used to efficiently fond out whether a certain node already exists. And if it does, it gives you its id.

孤星 2024-12-20 06:59:06

这段代码的目的究竟是什么?

if ( dict.ContainsValue( n ) )
{
    foreach ( KeyValuePair kv in dict )
    {
        if ( kv.Value == n )
        {
            return kv.Key;
        }
    }
}

ContainsValue 搜索(而不是键)并且效率非常低 (O(n))。 foreach 也是如此。更不用说当只需要一个时您就同时执行这两项操作(您可以通过稍微重新安排您的 if 来完全删除 ContainsValue)!

您可能应该维护与原始字典“相反”的附加字典(即旧字典中的值是新字典中的键,反之亦然),以“覆盖”您的搜索模式(类似于数据库如何维护表中的多个索引)以涵盖查询表的多种方式)。

What exactly is the purpose of this code?

if ( dict.ContainsValue( n ) )
{
    foreach ( KeyValuePair kv in dict )
    {
        if ( kv.Value == n )
        {
            return kv.Key;
        }
    }
}

The ContainsValue searches for a value (instead of a key) and is very inefficient (O(n)). Ditto for foreach. Let alone you do both when only one is necessary (you could completely remove ContainsValue by rearranging your ifs a little)!

You should probably mainntain additional dictionary that is "reverse" of the original one (i.e. values in old dictionary are keys in the new one and vice versa), to "cover" your search patterns (similarly to how databases can maintain multiple indexes par table to cover multiple ways table can be queried).

半枫 2024-12-20 06:59:06

您可以尝试使用 HashSet

You could try using HashSet<T>

音盲 2024-12-20 06:59:06

您可能需要考虑将其重组为仅使用列表(其中“键”只是列表中的索引)而不是字典。一些优点:

  1. 通过整数键查找元素现在是 O(1)(并且非常快的 O(1),因为它只是内部的数组取消引用)。

  2. 当插入一个新元素时,您将执行 O(n) 搜索来查看它是否已存在于列表中。如果没有,您也已经遍历了该列表,并且可以记录您是否遇到了具有空记录的条目。如果有,该索引就是新键。如果不是,则新键是当前列表计数。您只需枚举一次集合而不是多次,并且枚举本身比枚举字典要快得多。

You might want to consider restructuring this to just use a List (where the 'key' is just the index into the List) instead of a Dictionary. A few advantages:

  1. Looking up an element by integer key is now O(1) (and a very fast O(1) given that it's just an array dereference internally).

  2. When you insert a new element, you perform an O(n) search to see whether it already exists in the list. If it does not, you've also already traversed the list and can have recorded whether you encountered an entry with a null record. If you have, that index is the new key. If not, the new key is the current list Count. You're enumerating the collection once instead of multiple times and the enumeration itself is much faster than enumerating a Dictionary.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文